Almost everyone who works with sensor data, processs and stores time series data. This post contains some of my experiences with Azure Time Series Insights, gained by migrating an existing application to Time Series Insights.
The “Pre Time Series Insights” era
In the past days it was difficult to store the time series data in suitable way to retreive it later on without much delay. We used all kinds of tricks to improve the query performance on the frontend.
A typical pattern, which I used in the past, was to aggregate the data over slots of several minutes, hours and days and store the im different tables or collections. But even with the support of windowing functions in Azure Stream Analytics, the resulting query performance while reading the data was not overwhelming.
The following picture shows such an example:
The example uses Stream Analytics with window functions to aggreagte the data over time and puts it to different storage collections containing the aggregates. This requires much memory and performance and Stream Analytics is restricted to 5 outputs. The type of storage can range from CosmosDB with collections for the aggrgates, Azure SQL (DWH) with tables for the aggregates or even table storages. The Query API represents a web service which queries the data, typically a backend API service for a single page web app or mobile apps.
Another drawback of these approaches is the high coupling between the writer (Stream Analytics) and the reader (Query API). Each side needs logic to manage the different aggreates, which clutters the logic in the components around the data store and makes it hard to maintain the solution. The Query API in the picture above is restricted to the granularity of the aggregations which are provided by the Stream Analytics Job. Changes on the granularity require changes in two components and migration or recalculation of the stored aggregates.
Migrating to Time Series Insights
The migration from the previous solution to Time Series Insights, went very smooth. We changed the implementation of the query API service, which is responsibel for querying an retreiving the time series data for mobile apps. It was amazing how much code became obsolete and was dropped. We got a much cleaner imlementation. And the best: The performance boost was amazing!
As a starting point we used the code from the article Query data from the Azure Time Series Insights environment using C#. The only drop of bitterness was the buggy documentation of the Query API and the Query Syntax especially the part realted to aggregates. From a developers perspective, a swagger API would be a huge benefit. If you agree with that, you can vote for my suggestion for a Swagger API for Time Series Insights
Where Time Series Insights shines
The most important advantage of Time Series Insights is its query performance. Especially important four our cases is the ability to retreive data aggregated over time using a dateHistogram aggregate without noticable delays. Here Time Series Insights shines! This is a common use case for apps which can only display a chart with a certain amount of data points independent of the chosen time span.
The integrated Time Series Insights explorere is very useful tool for testing and debugging the incomimng data, which saves a lot of time.
The decoupling of the input API from the output API improves maintainability of the whole solution. This goes into the direction of CQRS where separate model for writing and reading (querying) exist.
The picture below shows a typical example with Time Series insights.
In this example the Stream Analytics Job contains only the logic to reshape the data. No aggregation is required. Therefore the performance requirements are considerably lower. It might be even possible to completly omit the Stream Analytics Job if the data already has the apropriate format and no additional logic for example for alarming, is required. The preprocessed data is forwarded to an Event Hub which is attached to the Time Series Insights by an Event Source (not shown in the picture).Time Series Insights stores the data in an appropriate format which facilitats fast retreival. Time Series Insigths already contains an API which can be used to query aggregates and raw events. The Query API here is much smaller and acts more like a serving layer which hides the technical details of the Time Series Insights API from the public API for single page web apps or mobile apps and returns data combined from Time Series Insights and other data sources.
Conclusions
Due to the restrictions of the amount of data and the retention time of currently max 400 days, Time Series Insights is not intendend to store data over a long term. For those familiar with the lambda architecture pattern, Time Series Insights is a good storage for data at speed, between the speed and the serving layer. For long term data storage there are other solutions such as Data Lake Storage or simple blob storages which can be processed by HDInsight.
The real benefit of Time Series Insights is suitable to give the clients high performance access to the most recent data aggregated over any time range and granularity.
This post first appeared on: martin-weber.github.io