Compressing Data Cube in Parallel OLAP System

Boyong Liang

Abstract

Data warehouse provides the primary support for Decision Support Systems (DSS) and Business Intelligent (BI) systems. One of the most interesting recent themes has been the computation and manipulation of the data cube, a relational model that can be used to support On-Line Analytical Processing (OLAP).

Within the context of massive data volume, the data cube computation has to be very efficient with respect to speed and space. Many research studies showed that parallel computation effectively speeds up data cube construction. Data compression, on the other hand, is not only crucial for computing and storing data cubes in limited space, but also reduces the I/O access.

This thesis surveys the most recent database compression techniques and proposes a more efficient data cube compression algorithm. I also integrate this algorithm in PANDA system, which has proven itself one of the most efficient parallel OLAP computing systems. The experimental results demonstrate that this algorithm extremely reduces data cube storage space in a same range of running time in parallel OLAP computing systems. As a result, It makes PANDA a more efficient practical OLAP systems.