Big data platform

Cloud Stack with RynBI

Data warehouse

SQL Data Warehouse is a key component of an end-to-end big data solution in the Cloud.

SQL Data Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key component of a big data solution. Import big data into SQL Data Warehouse with simple PolyBase T-SQL queries, and then use the power of MPP to run high-performance analytics. As you integrate and analyze, the data warehouse will become the single version of truth your business can count on for insights.

In a cloud data solution, data is ingested into big data stores from a variety of sources. Once in a big data store, Hadoop, Spark, and machine learning algorithms prepare and train the data. When the data is ready for complex analysis, SQL Data Warehouse uses PolyBase to query the big data stores. PolyBase uses standard T-SQL queries to bring the data into SQL Data Warehouse.

SQL Data Warehouse stores data into relational tables with columnar storage. This format significantly reduces the data storage costs, and improves query performance. Once data is stored in SQL Data Warehouse, you can run analytics at massive scale. Compared to traditional database systems, analysis queries finish in seconds instead of minutes, or hours instead of days.

The analysis results can go to worldwide reporting databases or applications. Business analysts can then gain insights to make well-informed business decisions.

The analysis results can go to worldwide reporting databases or applications. Business analysts can then gain insights to make well-informed business decisions.

Reporting and Planning tool

Information defining the OLAP data source is embedded in a worksheet. When you open a report or when you refresh the data connection, Excel uses Microsoft SQL Server Analysis Services (SSAS) to connect to the data warehouse to retrieve key performance indicators (KPIs) and other data. After it is opened, the current worksheet contains a snapshot or subset of data from the data warehouse. If you save a worksheet, the data source connection information, KPIs, and any other customizations you have made are saved with it. If you save the worksheet to an analysis library, you can later reopen it without having to use the Service Manager console.

KPIs included in Service Manager data cubes are predefined, special, calculated measures that are defined on the server that make it possible for you to track KPIs, such as status (does the current value meet a specific number?) and trend (what is the value over time?). When these KPIs are displayed in a PivotTable, the server can send related icons that are similar to the new Excel icon set to indicate status levels that are above or below a certain threshold (for example, with a stop light icon) or whether a value is trending up or down (for example, with a directional arrow icon).

PivotTables can help you quickly and easily create useful reports. PivotTables that appear in Service Manager data cubes include many predefined KPI categories, called measure groups or dimensions. These groups are the highest level of categorization, and they help you examine the data and focus your analysis. In turn, most measure groups have many additional levels of subcategories and individual fields. All the categories, subcategories, and fields are contained in the PivotTable Field List. For example, you can create a straightforward report using the following steps:

Using the PivotTable Field List, select a category and add it as a row. Select a second category and add it as a column. Select a category or subcategory to add values.

Data Modelling tool

Microsoft SQL Server Analysis Services incorporates a sophisticated algorithm to select aggregations for precalculation so that other aggregations can be quickly computed from the precalculated values. For example, if the aggregations are precalculated for the Month level of a Time hierarchy, the calculation for a Quarter level requires only the summarization of three numbers, which can be quickly computed on demand. This technique saves processing time and reduces storage requirements, with minimal effect on query response time.

The Aggregation Design Wizard provides options for you to specify storage and percentage constraints on the algorithm to achieve a satisfactory tradeoff between query response time and storage requirements. However, the Aggregation Design Wizard's algorithm assumes that all possible queries are equally likely. The Usage-Based Optimization Wizard lets you adjust the aggregation design for a measure group by analyzing the queries that have been submitted by client applications. By using the wizard to tune a cube's aggregation you can increase responsiveness to frequent queries and decrease responsiveness to infrequent queries without significantly affecting the storage needed for the cube.

Aggregations are designed by using the wizards but are not actually calculated until the partition for which the aggregations are designed is processed. After the aggregation has been created, if the structure of a cube ever changes, or if data is added to or changed in a cube's source tables, it is usually necessary to review the cube's aggregations and process the cube again.

Data Processing and Analytics Service

Tabular models in Analysis Services are databases that run in-memory or in DirectQuery mode, connecting to data directly from back-end relational data sources. By using state-of-the-art compression algorithms and multi-threaded query processor, the Analysis Services Vertipaq analytics engine delivers fast access to tabular model objects and data by reporting client applications like Power BI and Excel.

While in-memory models are the default, DirectQuery is an alternative query mode for models that are either too large to fit in memory, or when data volatility precludes a reasonable processing strategy. DirectQuery achieves parity with in-memory models through support for a wide array of data sources, ability to handle calculated tables and columns in a DirectQuery model, row level security via DAX expressions that reach the back-end database, and query optimizations that result in faster throughput.

Tabular models are created in SQL Server Data Tools (SSDT) using the Tabular model project template. The project template provides a design surface for creating semantic model objects like tables, partitions, relationships, hierarchies, measures, and KPIs.

Tabular models can be deployed to Azure Analysis Services or an instance of SQL Server Analysis Services configured for Tabular server mode. Deployed tabular models can be managed in SQL Server Management Studio.

Data ETL framework

Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions. Use Integration Services to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data.

Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.

Integration Services includes a rich set of built-in tasks and transformations, graphical tools for building packages, and the Integration Services Catalog database, where you store, run, and manage packages.

You can use the graphical Integration Services tools to create solutions without writing a single line of code. You can also program the extensive Integration Services object model to create packages programmatically and code custom tasks and other package objects.