SMART Search Layer
SMART Search Layer
Overview
The SMART search engine indexes in real-time streams of updates from sensors and social networks (e.g. tweets), to allow questions such as "where is live music going on that my friends are at?" to be answered. To faciliate this, for SMART, we are creating a new search technology based on the Terrier platform, which connects with SMART edge nodes using REST-ful APIs, and which provides visualisations of results in the form of reusable Web 2.0 mashups.
Infrastructure
The SMART search engine must deal with sensor updates from many edge nodes, with each edge node potentially addressing multiple analyses of many sensors. For this reason, the large amount of data arriving from edge nodes will be indexed by making use of the Storm framework. In particular, Storm is a distributed processing environment, which we use to handle the streams of data in real-time and distribute the workload of indexing the social networks and sensor streams using Terrier across multiple machines in a cluster. Terrier is enhanced to use real-time, in-memory indices, such that as soon as a message (e.g. tweet) is posted/received, or an update from sensors is received it is indexed, and made available for search. Moreover, to facilitate detecting the "unusualness" of an event from a sensor, standard search engine data structures will be augmented to monitor the periodicity of sensor observations. For instance, using such technology the search engine can know many cars on a street is expected at 8am, but not at 3am.
Retrieving Events
The SMART search engine offers a search interface to services and end users to retrieve interesting events in the physical world and associated relevant posts in the social networks. While an interested event is a subjective notion, the SMART search engine can make inferences on interestingness, based on how unusual an event is, and learning from training examples of interesting events. This integrates short-term event detection, and longer-term periodic event detection technology. The SMART search engine will combine such features to facilitate the ranking of all events currently happening, building upon modern learning-to-rank technologies, such as gradient-boosted regression trees or LambdaMART. We combine this state-of-the-art ranking technology within an efficient search engine that can respond quickly to many concurrent user queries. With this in mind, the search engine will develop new efficient yet effective retrieval techniques, for instance inspired by dynamic pruning.
Anticipating and Running Queries
Once a query has been responded to by the SMART search engine, the may wish to be updated with more recent events for this query as soon as it happens. In this way, a user's query continues to live after it was initially responded to, which we call a "running query". The SMART search engine will deploy information filtering techniques to assure that interesting events can be detected and retrieved for presentation to the user, along with appropriate HTTP-based communication infrastructure to push new events to the user applications, and without damaging the overall efficiency of the search engines. Moreover, as users respond to real-world events by querying search engines, the SMART search engine can anticipate queries before they occur, depending on the context, such as the location and time of extremely unusual events.
The SMART Search Engine API
The API is a RESTful Web service that can be accessed by end users using a Web browser, or by other applications (e.g. mobile phone applications). Results are provided in an easily parsable format such as JSON and XML. The API allows the applications to register queries so that results are pushed to them as soon as they occur.
Mashup Visualisations
The SMART search engine offers a service to visual results as real-time mashups aggregated from the various types of streams. For instance, users will be able to view newly-breaking events across the city with real-time baloon popups on maps. Mashups will support both uses cases of support, namely live news and security.