Looking for an IoT Data Warehouse? Here Is the Solution

Krystian Dziubinski

Jul 29, 2021 • 13 min read
Find a warehouse solution for IoT data

Everything we use on a daily basis is gradually getting “connected” to the Internet.

Our fridges, thermostats, cookers, heaters, boilers, watches, and many more can now come with an internet connector. This massive network of connected devices and machines is what makes the Internet of Things (IoT).

IoT is more than just homes — smart cities, cars, factories, and healthcare are just a few examples of IoT networks. All of these IoT devices generate data, by using numerous sensors that constantly make measurements such as temperature, humidity, air pressure, gas levels, location, heartbeat, etc.

However, IoT data is not just sensor data but also behavioral user interactions data. The devices and machines often come with user interfaces whose interaction data may also be gathered and analyzed.

Every click, touch, pageview, scroll, or even mouse movement can be recorded and stored for further analysis. This generates a massive amount of clickstream data, which needs to be organized and stored. To address this challenge we need a flexible, scalable, secure, and well-performing solution that keeps up with technological advancements. Snowflake data warehouse can come to the rescue.

Understanding the seven Vs of Big Data

Data, in general, can be characterized in seven Vs and Internet of Things data makes full use of all these characteristics:

1. Volume: expresses the size of data

The amount of data generated by IoT devices is expected to reach 73.1 ZB (zettabytes) by 2025. In 2019, 127 devices per second hooked up to the internet for the first time.

2. Velocity: the speed at which the data is generated

17.3 ZB of data was produced in 2019. The speed at which the data is being produced now is not comprehendible by the human brain.

3. Variety: different data types and formats

Due to ever-changing software and hardware, data storing formats may vary. Recently, the most common are JSON, XML, and Parquet, but the ever-increasing list of file formats is tremendously long.

4. Variability: data meaning is constantly changing

Even in a basic business model that has multiple sales channels, it can still be problematic to calculate simple metrics like “revenue”.

5. Veracity: the trustworthiness of the data in terms of accuracy

In IoT there is always a chance of inaccurate data, which may be off by some constant, coefficient, or even exponent. However, it's important to understand that by simply knowing this “off” number, the data will still be usable.

6. Visualization: the data is in a manner that’s readable and accessible

Aggregation or joins may be very difficult to compute if the data is spread across closed data marts — various databases or data warehouses that process the data differently. Therefore, it is important to be able to bring all data together into a unified, distributed data storage that can support streaming and data batching.

7. Value: monetary or meaningful

The industrial IoT market size is expected to reach $123.89 billion in 2021.

What is unique about the Snowflake platform Data Cloud warehouse?

The Snowflake platform enables data storage, data processing, and data analytics. It uses data faster and more easily than other similar platforms.

Snowflake_architecture

Image source: Snowflake: Key Concepts & Architecture

The Snowflake architecture is divided into three layers:

  • Database storage. It is capable of storing nearly unlimited amounts of data thanks to its cloud-based data storage. The data stored in Snowflake's cloud data warehouse is compressed, and columnar, with automatic micro-partitioning and support for semi-structured data. It allows for storing an enormous number of rows and columns in a table-like structure.
  • Query processing. It is important to note that the data is only accessible to SQL users. Therefore, with Snowflake data warehouse, it is possible to invoke what they call “virtual warehouses” or independent computing clusters, with just a few clicks or a very simple snowsql query.
  • Cloud Services. It is a collection of services that harmonizes different activities across Snowflake:
    1. Authentication
    2. Infrastructure management
    3. Metadata management
    4. Query parsing and optimization
    5. Access control

Cloud-native data loading

Snowflake platform provides support for semi-structured origin formats such as JSON, Avro, ORC, Parquet, or XML. This makes it much easier to load data from various sources into the warehouse.

Additionally, the transition of the data from the data lake into the warehouse works seamlessly thanks to integration with cloud solutions such as AWS S3, Google Cloud Storage, or Microsoft Azure Blob Storage with Snowflake’s external stages and Snowpipe.

Snowflake’s external stage reads the cloud storage and Snowpipe loads the data automatically if the auto-load is enabled. The autoload can also be disabled for more control over this process.

Scaling up vs scaling out warehouses

Further data transformations can be made on schedule with Snowflake’s streams and tasks, which helps massively in the organization of the data within the data warehouses.

All queries are executed using virtual warehouses:

  • Scaled UP by adding more servers to the cluster.
  • Scaled OUT by creating more clusters to support more users and run more queries simultaneously.
  • Scaled ACROSS by isolating workloads of different sized warehouses for individual types of queries. For example, warehouses can be used for marketing analysis simultaneously with others for data transformations.

Furthermore, Snowflake data warehouse caches queried data by first preserving metadata in a cloud services layer such as:

  • Partition-level metadata like row count.
  • Partition-column-level metadata like min/max values, number of DISTINCT or NULL values.
  • Table versions and references to physical files.

Some operations don't even require running a virtual warehouse!

On top of that, all other queried results are stored in the virtual warehouse’s SSD storage and, as long as the warehouse exists, the data will be available to all users of the same warehouse to retrieve with blasting fast execution and a large cost-saving.

Snowflake_IoT_Reference_Architecture

Image source: Snowflake: IoT Reference Architecture

The crucial role of time series and date data in IoT

When it comes to IoT, time series and date data play a crucial role in data query optimization. Therefore, Snowflake provides a vast number of functionalities that help to deal with the simplification of queries as well as tuning performance for time series.

Often, base operations are supported by in-built SQL functions for time construction, extraction, addition/subtraction, truncation, conversion, or time zoning. Furthermore, query optimization can be set to automatic and custom, partitioning and clustering. Tables can be easily reclustered or repartitioned by specifying a cluster key in the table definition.

The disadvantages of using Snowflake

Snowflake technology is an amazing data warehouse solution that can be used for, among, others, IoT analytics but there are two downsides to take into account:

  • Storing data that is not expected to be used in the near future, like archived data, may be done more cheaply in other cloud environments as Snowflake doesn’t provide different storage tiers for infrequently accessed data.
  • Snowflake Time Travel, if not used carefully, can get very expensive. If seven days of time travel are enabled on a table, all operations such inserts or data updates can be reversed for up to seven days. If time travel is enabled, all operations on the data are really just pointer operations. The underlying data is not changed until the time travel specified days are past until that time all operations are reversible.
    After that time, the data gets permanently changed but until that happens the pointers have to be stored by Snowflake. To keep track of operations, Snowflake stores the pointers and metadata of all operations over these seven days. Storage and retrieval of all changes to the pointers of the data are what makes time travel expensive.

Snowflake helps to solve complicated Business Intelligence problems

We could enumerate the capabilities of Snowflake platform for a very long time but what we see is that when data, especially IoT, gets out of control, we need technology that works. Snowflake will do a great job when processes need optimization for OLAP operations such as complex joins of big tables, aggregations, or analytical windowing.

Additionally, the support for semi-structured data, cloud integrations, the rich base of scalar functions, with support for encryption as well as geospatial or binary data types, will do perfectly when dealing with a huge variety and variability of the data. Even the toughest Business Intelligence problems can be solved with Snowflake that provides IoT analytics.

Warehouse solution for IoT data in retail

Let’s have a look at the modern solutions in the retail industry in order to get a better understanding of how Snowflake data warehouse fits into the IoT analytics.

There may be many data-driven, problem solving use cases in retail. Technological advancement in physical retail stores brings tons of ideas for the use of sensors, cameras, tills, scanners and mobile devices. Combining this with marketing, sales and revenue data, an amazing 360 view of the logistics around product, customers and store can be achieved.

Examples of using IoT in retail

Business analysis of the stock in physical retail stores can be done with the use of a network of sensors on the shelves to automatically analyze the most commonly chosen items by customers. Businesses can use this data for further analysis to understand the causes of shrinkage in the stores, by combining it with sales and product data from tills.

Also, the data from sensors on shelves can also be combined with marketing activities to optimize multiple channels for various product-oriented marketing campaigns. Knowing what customers pick from a shelf at a given time and place provides valuable information about a customer’s shopping preferences.

Additionally, the 360 view of the product, from the distribution center until the product is sold to the customers, could be achieved by providing mobile devices with barcode scanners to the store assistants. Many retailers need to know very precisely what and how products are packed and stored in the distribution centers, as well as when and where those products are sold.

However, it can be difficult to see what is happening in the middle of that product journey. When the delivery from the distribution center reaches the stores and is picked up to be put on the shelves, with a clever use of mobile devices, the store assistants can scan products on the shelves while unloading the delivery.

The combination of scanners and sensors on the shelves provides previously invisible information to Business Intelligence and a gap in the product’s lifetime gets uncovered. This way the business gets ready to address the toughest challenges in logistics, marketing and sales thanks to IoT analytics.

Use smart IoT data warehousing for IoT analytics

Looking at the above examples we can clearly see that this combination of data characterised in the 7 Vs into a single warehouse is a challenge for a warehousing platform. However, with Snowflake’s scaling, semi-structured data support, cloud native data loading, and cloud services that the warehouse offers, the IoT data collection, organization, joins and preparation for visualisation can be done with minimal effort.

Combining a high volume of IoT data requires holistic data extraction, data transformation and loading into a centralized data warehouse. Later, this massive amount of data requires complex joins and aggregations so that the queries from Business Intelligence can run smoothly.

As explained above, Snowflake platform is one of the best solutions for businesses to benefit from IoT analytics.

More posts by this author

Krystian Dziubinski

Krystian Dziubinski works as a Senior Data Engineer at Netguru.
Fuel your digital growth with cloud solutions