How to Integrate Google Searches into Your Application
By Klaus Salchner www.csharphelp.com
Introduction
The first thing coming to mind when we hear Google is search engine. Google has been able to turn the search business up-side-down within the last five years. The founders of Google started with an idea in 95 which really became widely used and known in 98/99. Today Google is the number one search engine. You can find out more about Google's history here. Like other organizations Google is trying to establish itself as a platform rather then a solution. This means it provides the necessary tools and infrastructure so other people can build their own solutions on top of it. Google provides a web service interface which allows you to integrate Google searches right into your application. You can find out more about the Google web service API at http://www.google.ca/apis .
How to get started with the Google API
You can download from the URL above the developer's kit which comes with a number of sample applications for different languages like .NET or Java. You also need a valid license key, which you need to pass along with every web service call. To obtain a Google license key visit the URL http://www.google.ca/apis and select create Account?on the left side navigation bar. You need to create an account by entering your email address and a password. This sends an email to the email address you entered to verify its existence. The email you receive has a link to complete the account creation by activating it. When done click on the continue link which brings you back to the account creation page. At the bottom of the page you see a link 搒ign in here? Follow the link and sign into your account with your email address and password. This shows then a page confirming that a license key has been generated and sent to your email address. Should you loose your license key, sign in again and Google will resend the license key to your email address. The license key is for free but limits you to 1,000 calls per day. This will be more then enough to get started. If you need to make more then 1,000 calls per day contact Google.
How to reference the Google web service API in your project
Create your project in Visual Studio .NET and in the "solution explorer" pane right click on the project. In the popup menu select add Web Reference?and enter as URL the following WSDL URL - http://api.google.com/GoogleSearch.wsdl . This will check the existence of the WSDL, download it and show you in the dialog the web methods available by this web service. Enter under web reference name?the name of the web service reference, for example GoogleSearch. When done click add Reference?and you are ready to use the Google web service API. It will be shown in the solution explorer?under web References? You can right click on the web service reference and update it through the update Web Reference?menu item or view it in the object explorer through the view in Object Browser?popup menu. This shows you that there are four different types available. The type GoogleSearchService exposes the actual web service calls you can make. It has three different web methods (plus the usual Begin/End methods if you want to call a web method asynchronously).
GoogleSearchService.doSpellingSuggestion()
When you open up Google in your browser and search for a word or phrase you see sometimes the phrase id you mean: [suggested search term]?at the top of the search results page. Google performs a spell check of the search term you entered and then shows you alternative spellings of your search term. This helps the user to search for properly spelled words and phrases and the user can simply click on it to search for the corrected search term. The Google web service also provides a web method to check for alternate spellings of a search term. Here is a code snippet:
public static string SpellingSuggestion(string Phrase)
{
// create an instance of the Google web service
Google.GoogleSearchService GoogleService = new
Google.GoogleSearchService();
// get the new spelling suggestion
string SpellingSuggestion = GoogleService.doSpellingSuggestion(Key,
Phrase);
// null means we have no spelling suggestion
if (SpellingSuggestion == null)
SpellingSuggestion = Phrase;
// release the web service object
GoogleService.Dispose();
return SpellingSuggestion;
}
First we create an instance of the web GoogleSearchService class and then we call the web method doSpellingSuggestion(). The first argument is the Google license key you pass along and the second one is the search term. The web method returns the alternate spelling of the search term or null if there is no alternate spelling. The code snippet above returns the alternate spelling or the original one. At the end it calls Dispose() to free up the underlying unmanaged resource.
GoogleSearchService.doGetCachedPage()
Google is constantly crawling the Internet to keep its search index and directory up to date. Google抯 crawler also caches the content locally on its servers and allows you to obtain the cached page, which is the content as of when the crawler visited that resource the last time. URL抯 can point to many different resources, most typically to HTML pages. But these can also be Word documents, PDF files, PowerPoint slides, etc. The cached page is always in HTML format. So for any other resources then HTML it also converts the format to HTML. Here is a code snippet:
public static void GetCachedPageAndSaveToFile(string PageUrl, string
FileName)
{
// create an instance of the Google web service
Google.GoogleSearchService GoogleService = new
Google.GoogleSearchService();
// get the cached page content
byte[] CachedPage = GoogleService.doGetCachedPage(Key, PageUrl);
// file writer to write a stream to the file & a binary writer to write
data to
FileStream FileWriter = new FileStream(FileName, FileMode.Create);
BinaryWriter Writer = new BinaryWriter(FileWriter);
// write the page content to the file and close the streams;
Writer.Write(CachedPage);
Writer.Close();
FileWriter.Close();
// release the web service object
GoogleService.Dispose();
}
First we again create an instance of the GoogleSearchService class and then we call the web method doGetCachedPage(). We pass along the Google license key plus the URL of the page we are looking for. This returns a byte array, using base64 encoding, which contains the HTML content of the cached page. Next we create a FileStream which we use to write the obtained page to a local file. With FileMode.Create we tell it to create the file, which overwrites any existing file. Then we create a BinaryWriter which uses as output the FileStream. Then we write the returned byte array to the BinaryWriter which in turn writes it to the FileStream, which in turn writes it to the local file. Then we close the FileStream and BinaryWriter. At the end we call again Dispose() to free up underlying unmanaged resources.
GoogleSearchService.doGoogleSearch()
The web method doGoogleSearch() allows you to perform searches. You pass along the search term and then certain filter criteria抯 to filter the content for example to a specific country, language, topic, etc. Here are the arguments you pass along to the web method:
Key ?The Google license key.
QueryTerm ?The actual search term. This can be a simple word, a phrase (to search for the phrase you need to put it under double quotes otherwise it searches for the occurrence of all individual words), a list of words (you can use the AND or OR operator; when no operator is used between the words AND is assumed), etc. You can also exclude words or phrases by putting a minus sign in front of it. The Google reference at http://www.google.ca/apis/reference.html explains all query term capabilities.
Start ?A zero based index of the first result to be returned. This allows you to page through the result set. The search result returned by this web method can not be more then MaxResults, therefore you need to make multiple calls and set Start appropriately to get the next results and so forth. If you provide a user interface which allows the user to page through the complete result set, then you would set Start accordingly, to return the results for each page. For example the first call would set it to 0, the next to 11, followed by 21, etc. (assuming MaxResults is set to 10).
MaxResults ?The maximum number of results to be returned by the query. This can be a value between one and ten.
Filter ?When set to true it filters out duplicate or near-duplicate search results. Near duplicate results are results with the same title and snippets (snippet is the summary text shown for each search result). This also limits the number of search results coming from the same host. So if a web site would return ten records matching the search term then this would only return the first two (called host crowding).
Restricts ?Allows to restrict the search to results from one or more countries or one or more topics. For example you can restrict the search to content within the US by setting this value to "countryUS". You can restrict the search to content centered around Linux by setting this value to "linux". The Google reference at http://www.google.ca/apis/reference.html lists all the possible values.
SafeSearch ?Filters out adult content when set to true.
LanguageRestrict ?This allows you to restrict the search within one or more languages. The Google reference at http://www.google.ca/apis/reference.html lists all the possible values.
InputEncoding ?This value is ignored. All requests should be encoded using UTF-8.
OutputEncoding ?This value is ignored. All returned results are encoded using UTF-8.
This web method allows you to perform simple or complex search queries against Google. It also allows you to filter the search result as well as page through the search result. Here is a code snippet:
public static XmlNode Search(string QueryTerm, int Start, int
MaxResults, bool Filter, string Restricts,
bool SafeSearch, string LanguageRestrict, string InputEncoding, string
OutputEncoding)
{
// create an instance of the Google web service
Google.GoogleSearchService GoogleService = new
Google.GoogleSearchService();
// perform search
Google.GoogleSearchResult SearchResult = GoogleService.doGoogleSearch(Key,
QueryTerm, Start,
MaxResults, Filter, Restricts, SafeSearch, LanguageRestrict,
InputEncoding, OutputEncoding);
// we return the result back as a XML document
XmlDocument ResultXml = CreateXmlDocument(SearchResultXmlNode);
// add the search result
StringValueOfObject(ResultXml.DocumentElement, SearchResult);
// add the result elements and directory categories root node
XmlElement ResultElementsParentNode =
AddChildElement(ResultXml.DocumentElement, "ResultElements");
XmlElement CategoriesParentNode =
AddChildElement(ResultXml.DocumentElement, "DirectoryCategories");
// now add all result elements
foreach (Google.ResultElement ResultElement in SearchResult.resultElements)
StringValueOfObject(ResultElementsParentNode, ResultElement);
// now add all directory categories
foreach (Google.DirectoryCategory DirectoryCategory in
SearchResult.directoryCategories)
StringValueOfObject(CategoriesParentNode, DirectoryCategory);
// release the web service object
GoogleService.Dispose();
return ResultXml;
}
First we create an instance of the GoogleSearchService class and then we call the web method doGoogleSearch(). We pass along all the arguments as described above. This performs the search and returns its result as an instance of the GoogleSearchResult class. The code snippet then takes all values of the GoogleSearchResult object and puts them into a XML document. Please refer to the attached sample application for the complete code. First it creates a XML document with the method CreateXmlDocument(). It then calls the method StringValueOfObject() which creates a XML element for the object in the XML document using the name of the object as the name of the XML element. The method uses then reflection to walk the returned GoogleSearchResult object and for each field it finds in the object it adds an attribute to the created XML element. It of course adds to each created attribute the value of the associated object field. The returned GoogleSearchResult object has two fields which hold an array of ResultElement and DirectoryCategory objects. The method StringValueOfObject() is not able to walk each object in those arrays. Therefore we create two root XML elements in the XML document using the method AddChildElement(). We then loop through both arrays and call for each object StringValueOfObject() so we can convert each object to a XML element adding all its fields as attributes. Finally we call again Dispose() to free up the underlying unmanaged resources and then return the XML document which contains all search information of the GoogleSearchService object. This enables you to run XPath queries against the search result XML document to find the required search result information.
The attached sample application
The attached sample application provides a wrapper class for all Google web methods. It also provides a simple user interface demonstrating the use of each web method. You can enter a search term and get alternate spelling suggestions, you can download the cached HTML page of a URL and display it and you can perform a search entering all the search arguments. Please make sure to obtain your own Google license key and enter it in the app.config file.
Download Source
Summary
The Google web service API is very easy to use. It enables you to search the Internet from within your application. Complex query terms and filtering capabilities assure relevancy of the search results to your application needs. The Google web service is one of many other emerging ones, like Amazon's web service or eBay's web service. By introducing a web service interface these companies moved to a platform, enabling third parties to build solutions non top of them. For these companies an ever increasing number of requests and business transactions are coming through these web service interfaces. If you have comments on this article or this topic, please contact me @ klaus_salchner@hotmail.com . I want to hear if you learned something new. Contact me if you have questions about this topic or article.
About the author
Klaus Salchner has worked for 14 years in the industry, nine years in Europe and another five years in North America. As a Senior Enterprise Architect with solid experience in enterprise software development, Klaus spends considerable time on performance, scalability, availability, maintainability, globalization/localization and security. The projects he has been involved in are used by more than a million users in 50 countries on three continents.
Klaus calls Vancouver, British Columbia his home at the moment. His next big goal is doing the New York marathon in 2005. Klaus is interested in guest speaking opportunities or as an author for .NET magazines or Web sites. He can be contacted at klaus_salchner@hotmail.com or http://www.enterprise-minds.com/ .
Enterprise application architecture and design consulting services are available. If you want to hear more about it contact me! Involve me in your projects and I will make a difference for you. Contact me if you have an idea for an article or research project. Also contact me if you want to co-author an article or join future research projects!