Data fusion is the process of getting data from multiple sources in order to build more sophisticated models and understand behaviour or characteristics of the studied subject or phenomenon. Data fusion often means combining data coming from different sources or sensors, and centralizing the data for analysis.
Data fusion can be broadly applied to different everyday phenomena. For example data scientists might use data fusion to combine physical location data with environmental data, or in a business analysts may combine client identifier data with online purchase history and other data collected at brick-and-mortar store locations to better understand the customer behaviour and for example make better suggestions for purchase.
Data fusion combines data from different sources, which means the nature of the data to be fused can be very different in nature. From some sources you get real-time information – updated millisecond by millisecond – about behaviour or phenomenon, such as updates on temperature or vibrations of an engine. From some sources you get geospatial data, meaning location of the tracked object, whether it’s outdoors tracked via GPS, or indoors tracked via indoor positioning systems. Some sources may give you timeseries data, e.g. development of crop price, or batch data, such as purchase history of your customers.
In order to fuse the different kinds of data, coming from different data sources, you need to have tools and processes that can deal with multitude of underlying data formats, work with real-time and batch data, process geospatial and timeseries data and bring everything together with the desired processing and business rules.
Data fusion needs data, and data comes in different formats and protocols from various sources. IoT systems may use MQTT or AMQP – and furthermore may package data into Protobuf or JSON formats. Satellite data comes in specific raster formats, such as GRIB, NetCDF, HDF etc. Business data may come in various different formats from underlying ERP, CRM, etc. systems, some of which may be modern, and some legacy systems also use legacy or even outdated data formats. Positioning and tracking systems provide data in different geospatial formats, suchs as GeoJSON, WKB, OpenStreetMap data formats or some other commercial formats. Big data systems may use specific big data formats such as ORC or Parquet, and different database solutions come with their own interfaces (SQL etc). Furthermore, reference data sets are quite often available in CSV or JSON formats, when expected from Excel files or similar.
In order to integrate the systems together to fuse the data, all of these data formats need to be dealt with.
WhereOS makes it easy to combine and fuse different types of data together, to create valuable products or services out of raw data. WhereOS supports real-time, batch, geospatial (indoor, outdoor), satellite (raster), timeseries, etc data sources, and enables all data to be fused together. WhereOS comes with support for various data formats and protocols out of the box, and can be extended via specific drivers to work with any data source or protocol, whether it’s streaming, static, timeseries, geospatial or something else.
These capabilities help developers to speed up data fusion work: WhereOS is an innovation platform, which helps you to use data from any data source, such as datalakes, data warehouses, databases, real-time IoT platforms and devices. WhereOS helps you to create AI solutions, APIs and applications using the data, and for example open it up easily for internal developers and 3rd party start-ups to innovate through challenge competitions, or alternatively use WhereOS partner network to bring in AI specialists to turn your data into something valuable.