译者说明:欢迎访问我的Blog: http://blog.csdn.net/daidaoke2001/
译文中的错误或不当之处望不吝指出,这也是我坚持翻译工作的最大动力。
我的Email:tangtaike@hotmail.com
如需转载,请事先通知。
第四章第二节 透视表组件如何处理数据
透视表组件最重要和最复杂的方面之一就是它是如何与各种数据源进行交互,以及它在一个会话中是如何操作数据的。本节会解释透视表控件如何与数据源通讯,以及在会话期间是如何传输和操作数据的。
透视表控件的功能是有一点不确定的――因为它的大部分功能都依赖于它所连接到的数据源的类型。它基本上只能使用两类数据源:表列数据和多维数据。(多维数据源也可以称为OLAP数据源;本书中我会交替使用这两个术语。)我们也会讨论使用XML数据来作数据源的情况。虽然XML数据看起来与任何其它用于透视表控件的表列数据源很相似,但它还是有一些需要特别讨论的需求。
表列数据源
现存任何公开数据表的OLE DB数据源都是表列数据库。一般来说,它们都是属于关系数据库引擎的领域的。不过,这个类别也能包含非关系型的数据提供者――只要它们具有某种形式的文本命令语法或者命名的表格???。
图4-6显示了将一个表列数据源返回的数据装载到透视表组件中后报表最初的样子。(您也可以通过运行随书光盘Chap04文件夹下的PivotTableList.htm文件来查看这个报表。)
这个报表和在一个Excel电子表格中导入一个外部数据区域后生成的报表相似。不过,因为透视表控件结合了外部数据区域和透视表报表两者的功能,所以您现在可以在这个报表中根据任何字段来对数据分组,以及为任何字段创建一个新的合计。例如,使用“Move To Row Area”,“Move to Column Area”和“AutoCalc”工具箱按钮,您可以将这个普通的数据列表转化成如图4-7所示的透视表报表。
图4-6。一个装载了表列数据源的数据的透视表报表。
图4-7。由一个普通的数据列表创建的透视表报表。
因为数据源是表列数据源,所以透视表控件可以显示任何统计值下的细目信息――这就意味着您可以展开任何数字,立刻查看到组成这个数字的各行。图4-8显示了访问表列数据源的一个整体结构。
图4-8。访问表列数据源。
当提取数据时,透视表控件首先连接到连接字符串中的provider属性中定义的OLE DB提供者。当在设计环境中创建报表时,您一般会在数据连接属性对话框的一个列表中选择所需的提供者。提供者是一个寄宿在客户端机器上的进程内COM组件,通常使用一种专用协议与数据服务器进行通讯(如果确实有一台服务器)。例如,SQL Server提供者使用各种协议与服务器进行通讯,最常用的协议叫做命名管道。不过,微软Jet数据库的提供者需要对MDB文件的文件访问权限,因为Jet数据库引擎不是一个客户端-服务器系统。
当透视表控件连接到数据源后,它将它的CommandText属性中的内容传递给提供者来执行。您可以在设计阶段使用属性工具箱中的Data Source段来设置CommandText属性,或者您也可以在运行时使用代码来设置它。透视表控件使用ADO的Recordset对象来执行命令文本,因此任何能够传递给Recordset的Open方法的值都可以在CommandText属性中使用。这些值一般包括SQL语句、表名、视图名或存储过程名。提供者执行后,返回一个OLE DB的IRowset接口,以便客户端能够访问执行命令返回的数据。
当操作表列数据源时,透视表控件使用ADO来将返回的数据立刻装载到名为微软游标引擎(WCE)的组件中,WCE是由微软数据访问组件(MDAC)提供的一个组件,它提供了在任何数据提供者上进行高级遍历,排序和过滤的功能。WCE将数据从数据源提供者中装载到它自己的内存缓存中,如果缓存中包含的数据超出了它所允许的内存上限,这些数据最终还是会被换页到磁盘上。(这可以防止WCE耗尽您所有可用的系统内存。)当数据被装载到WCE中后,透视表控件通过与WCE通讯来实现过滤,排序,以及遍历数据集。
当您开始根据一个字段对结果集进行分组,或当您使用与前面所介绍方法相似的步骤来创建一个统计值时,魔术般的事情发生了。为了形成交叉表格,透视表控件使用了另一种名为“透视表服务组件”的数据管道。这个组件实际上是OLAP服务的客户端提供者,但是它也能在不需要数据源的情况下,在客户端上创建临时cube。当您根据字段进行分组或创建一个统计值时,透视表控件将一个指向数据集的引用和需要临时cube中的那个维和哪个统计值的描述信息传递给透视表服务组件。这个引擎会在微软Windows操作系统的临时文件夹下建立一个临时文件,因此如果您公司的策略是不允许web浏览器中的控件创建临时文件,那么你就要注意这个问题。
命名临时cube文件
在实现这个功能时,透视表组件的明星数据开发者之一,David Worktendyke,必须设计一种命名临时cube文件的方案,以便它不会覆盖任何现存的文件或影响运行在其它应用程序中的另一个透视表控件。他提出的最终方案是使用当前进程和线程ID以及惯用的CUB扩展名来组成文件名。
因此当在使用表列数据源的透视表控件中进行分组和创建统计值时,如果您在您的临时文件夹下看到一些名字很奇怪的文件,记住它是由透视表控件创建的临时cube文件。不必担心――当控件销毁时这些文件会被自动删除。
当操作表列数据时,透视表控件会自动为细目数据中的每个日期或日期/时间字段生成两个时间层。一层包括年,季度,月和天的分组间隔;另一层包括年,周和天的分组间隔。(这两层都是必需的,因为周不能恰好组成一个月。)这些自动形成的层使得可以很容易的分析那些具有时间维的数据,允许您查看每一个时间间隔对应的数据摘要。
如果您计划在一个web页面上使用透视表控件,您可能还需要研究如何使用Remote Data Services(RDS)。这是MDAC中提供的另一种数据访问管道,它使用HTTP来访问数据源。当使用RDS时,客户端仅需要RDS提供者这一种提供者,它是随Office Web组件一起被安装的。RDS提供者然后就会通过web服务器和实际的数据提供者(例如,SQL Server)进行通讯,这使得原始数据源提供者只存在与服务器上。如果需要了解RDS的更多信息,请参考微软网站http://www.microsoft.com/data上关于数据访问部分的内容。
多维(OLAP)数据源
您很可能非常熟悉表列数据源或关系数据源,但您可能并不了解多维(或OLAP)数据源。在我介绍透视表组件中的各元素是如何映射到多维数据库的各结构上之前,请允许我先简单介绍一下多维数据库的概念。
OLAP简介
在关系数据库中,表和关系是最主要的数据结构和概念,您通过定义包含一列(或多列),主键,规则等的表来构建数据库。然后您通过指定这个表的外键(与其它表的主键相匹配)的方法来在这些表之间建立关系。指定和其它表的主键相对应的外键来在这些表之间建立关系。一旦完成了这些工作,您就可以在数据库引擎上执行SQL语句,可以根据需要使用关联,排序,限制和分组来满足客户的需求。
在多维数据库中,主键的数据结构是一个cube,或者更准确的说,是一个hypercube(超立方体)。这个结构是一个N-维的矩阵,要可视化它有点困难。在每一维中包含的项叫做members(成员),N个members的交点形成一个数字。让我们来看一个例子,可以让我们觉得不那么的抽象。
假设我们在一个超立方体中对一个公司的销售数据进行建模。在我们的例子中,我们从二维开始:产品和客户。二维结构很容易形象化,因为它的样子象一个矩形,您可能在比较两维信息时见过这种矩形,例如一个交叉表。图4-9显示了一个矩形可能的样子。
图4-9。一个二维的数据库。
请注意客户名称显示在一维中,而产品名称显示在另一维中,中央区域的数字是销售额。对于任何产品和客户的组合,都会存在一个数值,代表这个客户在这个产品上消费的金额的总和。还要注意在每一维中都有一个名为All的额外成员。这个成员代表了当前维所有成员的统计值(一般是所有成员的总和)。因此,Customers.All和某个产品的交叉点代表了这个产品的总销售额。同样,Products.All和某个客户的交叉点代表了这个客户所产生的总的消费额。两个All成员的交叉点就是所有客户,所有产品的销售总额。
现在设想将包含销售人员姓名的第三维加入到这个矩形中。结构就会变成一个三维的立方体,在概念上类似图4-10所示。
现在三个坐标――客户,产品,和销售人员――确定了立方体中的每个交叉点,或者说单元。在销售人员维中又出现了一个名为All的成员,它代表所有销售人员的销售总额。这个结构允许您从多个角度来查看数据,从而帮助您回答各种问题。因为这些数值都存储在结构中,所以多维数据库能够快速存取这些单元的任何集合。
图4-10。一个三维数据库。
很难将四维结构可视化,但可以设想您需要对另外的数据值进行汇总。例如,您可能不但需要了解销售的物品的销售额,还需要了解销售的数量。这些多值创建了包含两个成员(销售物品的销售数量和销售金额)的第四维。这些数值在多维数据库中被称为measures(尺寸);不过大多数的数据源将尺寸看作维。图4-11显示了一种形象化四维数据的方法。
在多维数据库内部仍然会在一个四维的超立方体中存储所有的数据。但是您可以采用将一个四维数据结构看作是多个三维立方体的方式来理解四维数据结构。如果您需要查看一个特定的客户,产品和销售人员交点的商品销售金额,您应该查看第一个cube。如果您需要了解同一个交叉点的产品销售数量,您应该查看第二个cube。当然您可以扩展这个例子以显示立方体中的表,以及显示cube中的的cube――但我到此为止,否则您会在企图可视化16维的空间时发疯。
图4-11。一个四维数据库。
大部分多维数据库也允许您对一个维中包含的成员进行分组,在分组中会对该成员隐式的指定了父元素和子元素。实际上,各维在它们内部都定义了一个或多个层,每一层都包含一个或多个levels(级别),每个级别又都包含一系列的成员。这模仿了大部分分类数据的自然结构――产品通常属于一个相关产品组,客户通常居住在一个国家的一个州的一个城市中,销售人员属于某个地区的某个管区,等等。例如,客户维可能包含级别All,国家,州,城市,和客户名称。国家级别中的成员集合可能是美国,加拿大和墨西哥,而州级别中的成员集合可能是华盛顿,俄勒冈,大不列颠-哥伦比亚,艾伯特,哈利斯科,韦拉克鲁斯等等。
一个单一维中有可能包含多个层。例如,如果您拥有一个雇员维,您可能需要根据部门结构来计算差旅费,以便了解每个经理和部门主管的花费的总和,或您可能需要了解从事某项工作职能的所有雇员(例如,市场、销售、产品开发、或行政人员)的花费的总和。维中的成员是相同的(都是各个雇员),但是他们被组织到不同的层,并因此创建了不同的统计值。
许多书籍,杂志,报告,和大量论文对多维数据库进行了深度的讨论。如果您已经购买了一个多维数据库,那么很可能您的数据库的附属文档对这些概念的描述要比我在这里介绍的详细的多。
透视表组件如何与OLAP数据源交互
透视表组件与OLAP数据源通信和交互的方式,和它与表列数据源通信交互的方式相似。图4-12显示了对这个结构的一个整体描述的图片。
图4-12。透视表组件和一个OLAP数据源之间的交互。
透视表控件使用微软定义的OLE DB for OLAP标准,这个标准被许多多维数据库所支持。这个模型是OLE DB标准的一个扩展,因此透视表控件和OLAP数据提供者之间的交互方式,自然和它与表列数据提供者之间的交互方式相似。控件首先连接数据提供者,这个提供者也是一个寄宿在客户端机器上的进程内COM组件。提供者决定它如何与多维数据库进行通讯。例如,OLAP服务在客户端和服务器之间使用TCP/IP的socket连接。
在透视表组件连接到OLAP数据源后,它可以通过透视表字段列表窗口在一个指定的超立方体中显示所有的层和尺寸。当用户将层和尺寸拖放到透视表控件中时,或者当开发者使用代码插入层和尺寸时,透视表控件在MDX(多维表达式,由OLE DB for OLAP标准定义的查询语言)中生成必需的查询请求,并在数据源上执行它们。最后数据提供者返回查询结果,透视表控件将结果显示在屏幕上。
当操作OLAP数据源时,通过网络传输的数据量是非常小的。OLAP提供者通常只是将MDX查询字符串发送给服务器,而服务器返回的也只是您在界面上看到的单元格和成员的名称。服务器只会将统计值发送回客户端,而不是那些创建这些总值所必需的底层的细目数据。这使得透视表控件可以快速响应请求,提高了系统的可伸缩性,从而能够支持大量的并发客户端。
我应该创建一个cube,还是只需对表列数据进行分组?
当我向人们展示透视表组件能够对表列数据进行分组和汇总,就好象它是一个来自OLAP cube的报表时,人们常常问我,“那么为什么我还要创建一个cube?”
对这个问题的回答要分为两部分。首先,使用一个预创建,基于服务器的超立方体常常能够比通过透视表控件创建表列数据的临时立方体的方式,获得性能上的极大提高。每当您在一个表列数据中对一个新的字段进行分组时,透视表控件必须重新创建cube并重新生成所有的统计值。而一个基于服务器端的cube只会创建这些统计值一次,所有访问这个cube的客户端都会共享这些统计值。
第二,一个预创建的cube可以使用多级别来定义各层,从而建立一个在数据中进行钻取的清晰的路径。而当透视表控件对关系型数据进行分组时,只会为日期字段创建层;它无法知道例如Country,State和city这样的字段实际上是同一层的三个级别。在一个预创建的cube中,您可以定义这些层,使得数据的用户能够方便的找到他们所需要的信息。
XML
透视表组件还有一个特殊的数据源,一个返回特定格式XML数据的URL。在ADO2.1版本中,微软的数据访问小组(开发MDAC的小组)定义了一种保存OLE DB的行集合的XML格式。它们也创建了一种叫做persistence provider(持久提供者)的数据访问管道,它可以通过读写这种格式的XML数据来存取一个OLE DB的行集合。透视表控件能够使用这个提供者来将从某个URL返回的xml数据装载到微软游标引擎中,图4-13描述了这种情况下的结构。
图4-13。使用永久提供者来将xml数据装载到WCE中。
我会马上解释如果要使用这种方法,您必须将那种类型的连接字符串传递给透视表控件。不过,为了即将开始的讨论,还要插一段,透视表控件所需要的最重要的信息是那个可以从中获得XML数据流的URL。透视表控件将这个URL传递给持久提供者,持久提供者接着使用Windows的Internet服务请求这个URL的返回结果。然后对结果解析,并装载到WCE中,而透视表控件接着开始处理结果数据――就象它处理表列数据一样。
这种xml数据的格式是特定的,很遗憾,这种格式相应的文档并不齐全。不过,查看这种格式最简单的方法就是使用ADO Recordset对象的Save方法将一个Recordset的内容以adPersistXML格式保存到一个文件中。如果您需要动态生成这种格式的数据――例如,在一个微软ASP页面中,您可以使用Recordset的Open方法来测试您的输出。如果您可以将您的XML数据装载到一个ADO Recordset对象中,那么它也会成功的被装载到透视表控件中,因为控件使用的是和ADO Recordset对象同样的机制。可以查看第6章介绍的解决方案的源代码,其中有在ASP页面中生成XML数据的例子。
附录:英文原文。
How the PivotTable Component Works with Data
One of the most important and complicated aspects of the PivotTable component is how it interacts with various data sources and how it manages that data during a session. This section will explain how the PivotTable control communicates with data sources, as well as how data is transferred and manipulated during a session.
The PivotTable control is a bit schizophrenic—much of its capabilities depend on the kind of data source to which it is connected. Essentially, the PivotTable control can use only two kinds of data sources: tabular and multidimensional. (Multidimensional data sources can also be called OLAP data sources; I will use these two terms interchangeably in this book.) We'll also discuss using XML data as a data source. Although XML data looks like any other tabular data source to the PivotTable control, it has a few requirements that warrant special discussion.
Tabular Data Sources
Tabular databases include any existing OLE DB data sources that expose tables of data. Traditionally, these are the relational database engines of the world. However, this category can also include nonrelational data providers—as long as they have some form of textual command syntax or named tables.
Figure 4-6 shows how a report initially looks when the PivotTable component is loaded with data returned from a tabular data source. (You can also see this report by running the PivotTableList.htm file from the Chap04 folder on the companion CD.)
The report is similar to that produced by using an external data range in an Excel spreadsheet. However, since the PivotTable control combines the functionality of external data ranges and PivotTable reports, you can now group the data by any field and create a new total for any field. For example, using the Move To Row Area, Move To Column Area, and AutoCalc toolbar buttons, you can transform this flat list of data into the PivotTable report shown in Figure 4-7.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-6. A PivotTable report filled with data from a tabular data source.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-7. A PivotTable report created from a flat list of data.
Because the data source is tabular, the PivotTable control can show the details behind any total—meaning you can expand any number and see the rows that contributed to it right in place. Figure 4-8 shows the general architecture for accessing tabular data sources.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-8. Accessing tabular data sources.
When retrieving data, the PivotTable control first connects to the OLE DB provider named in the provider attribute of the connection string. When building a report in a designer, you typically will choose this provider from a list in the Data Link Properties dialog box. The provider is an in-process COM component that resides on the client machine and typically communicates to the data server (if it is an actual server machine) through a private protocol. For example, the provider for SQL Server communicates to the server using a variety of protocols, the most common being named pipes. However, the provider for Microsoft Jet databases requires file access to the MDB file because the Jet database engine is not a client-server system.
After the PivotTable control connects to the data source, it passes the contents of its CommandText property to the provider for execution. You can set the CommandText property at design time using the Data Source section of the Property Toolbox, or you can set it in code at runtime. The PivotTable control uses the ADO Recordset object to execute the command text, so any value that can be passed to the Recordset's Open method can be used in the CommandText property. These values typically include SQL statements or the name of a table, view, or stored procedure. The provider then returns an OLE DB IRowset interface that allows access to data returned from the command.
When working with a tabular data source, the PivotTable control uses ADO to immediately load the returned data into a component known as the Windows Cursor Engine (WCE). The WCE is a component provided in the Microsoft Data Access Components (MDAC), offering advanced scrolling, sorting, and filtering functionality over any data provider. The WCE loads the data from the source provider into its own memory cache, which will eventually page to disk if it contains more data than its memory threshold will allow. (This keeps the WCE from using all your available system memory.) After the data is loaded into the WCE, the PivotTable control communicates with it to filter, sort, and scroll around the data set.
The magic begins when you start to group the resultset by a field or when you create a total using steps similar to those described earlier. To perform the cross tabulations, the PivotTable control employs another piece of data plumbing known as the PivotTable Services component. This component is actually the client-side provider for OLAP Services, but it also can provide temporary cube creation on the client regardless of the data source. When you group fields or create a total, the PivotTable control hands the PivotTable Services component a reference to the data set and describes what dimensions and totals it needs in the temporary cube. This engine will create a temporary file in the folder that serves as the temporary folder for Microsoft Windows, so beware of this requirement if your company's policy is to not allow controls in a web browser to create temporary files.
Naming the Temporary Cube
When implementing this feature, one of the star data developers of the PivotTable component, David Wortendyke, had to devise a scheme for naming the temporary cube so that it would not overwrite any existing file or interfere with PivotTable controls running in other applications. His eventual scheme was to construct the name of the file using the current process and thread ID and the traditional CUB extension.
So when grouping and creating totals in a PivotTable control using a tabular source, if you see some crazily named file in your temporary folder, remember that it is a temporary cube file being created by the PivotTable control. Don't worry—these files will be deleted automatically when the control is destroyed.
When working with tabular data, the PivotTable control also will automatically generate two time hierarchies for each date or date/time field in the detail data. One hierarchy contains the grouping intervals Year, Quarter, Month, and Day; the other contains the intervals Year, Week, and Day. (Both hierarchies are necessary because weeks do not neatly roll up into months.) These automatic hierarchies make it easier to analyze data that has a time dimension, allowing you to see summary values for each of the intervals.
If you plan to use the PivotTable control on a web page, you might also want to investigate using Remote Data Services (RDS). This is another piece of data access plumbing provided in MDAC, which accesses data sources over HTTP. When using RDS, the only provider needed on the client machine is the RDS provider, which is installed with the Office Web Components. The RDS provider then communicates with the real data provider—for example, SQL Server—through a web server, allowing the native data source provider to exist only on the server. For more information on RDS, consult the data access portion of the Microsoft web site at http://www.microsoft.com/data.
Multidimensional (OLAP) Data Sources
While you are most likely familiar with tabular or relational data sources, you might not be as familiar with multidimensional (or OLAP) data sources. Before I describe how the different elements of the PivotTable component map to the structures of a multidimensional database, let me give you a brief introduction to multidimensional database concepts.
A Brief Overview of OLAP
In a relational database, tables and relationships are the primary data structures and concepts. You construct databases by defining tables that contain one or more columns, a primary key, rules, and so on. Then you relate those tables to each other by specifying the foreign keys that match primary keys in other tables. Once this is done, you can execute a SQL statement against the database engine and it will join, sort, restrict, and group data as needed to fulfill the request.
In a multidimensional database, the primary data structure is a cube, or more precisely, a hypercube. This structure is an N-dimensional matrix, which is a bit hard to visualize. The items contained in each dimension are called members, and the intersection of N members produces a number. Looking at an example will help make this much less abstract.
Imagine that we are modeling sales data for a company in this hypercube. In our example, we will start with two dimensions: Products and Customers. A two-dimensional structure is fairly easy to visualize because it looks like a matrix you might see for comparing two dimensions of information, such as a crosstab report. Figure 4-9 shows how this matrix might look.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-9. A two-dimensional database.
Note that customer names appear in one dimension, product names appear in the other, and the numbers in the center are sales. For any combination of product and customer, a value is stored representing the amount of money the customer spent on the product. Also note that one extra member, named All, appears in each dimension. This member represents the total for all members in that dimension (often a sum of all the members). So the intersection of Customers.All and a specific product represents the total sales for that product. Similarly, the intersection of Products.All and a specific customer represents the total sales made to that customer. The intersection of the two All members is the grand sales total of all products to all customers.
Now imagine adding to this matrix a third dimension that contains salesperson names. The structure becomes a three-dimensional cube and conceptually looks like Figure 4-10.
Three coordinates—a customer, a product, and a salesperson—now determine each intersection or cell in the cube. Again a member named All appears in the Salesperson dimension and symbolizes the total sales for all salespeople. This structure can help you answer a variety of questions by allowing you to view the data from a number of perspectives. Since these numbers are stored in the structure, the multidimensional database can return any set of these cells quickly.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-10. A three-dimensional database.
It is harder to visualize four dimensions, but suppose you want to summarize additional data values. For example, you might want to track quantity sold as well as the dollar value of items sold. These multiple values create a fourth dimension that has two members (Quantity Sold and Dollar Value of Items Sold). These data values are called measures in the multidimensional database; however, most data sources treat measures like any other dimension. Figure 4-11 shows one way of visualizing four dimensions of data.
Internally, the multidimensional database will still store all the data in one, fourdimensional hypercube. But you can conceptualize a four-dimensional data structure by thinking of the fourth dimension as multiple three-dimensional cubes. If you want to see the dollar value of items sold for a given customer, product, and salesperson intersection, you would look at the first cube. If you want to know the quantity sold for the same intersection, you would look at the second cube. You could of course expand this example to show tables of cubes and cubes of cubes—but I will stop before your mind explodes from visualizing 16-dimensional space.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-11. A four-dimensional database.
Most multidimensional databases also let you group the members contained in a dimension, potentially specifying a parent and set of children for each member. In fact, dimensions have one or more hierarchies defined within them, and each hierarchy has one or more levels, each of which has a set of members. This mimics the natural structure of most categorical data—products typically fall into groups of related products, customers live in cities within states within countries, and salespeople belong to certain districts that belong to certain regions and so on. For example, the Customers dimension might have the levels All, Country, State, City, and Customer Name. The set of members at the Country level might be USA, Canada, and Mexico, while the set of members for the State level might be Washington, Oregon,
British Columbia, Alberta, Jalisco, Veracruz, and so on.
It is possible to have multiple hierarchies within a single dimension. For example, if you have an Employees dimension, you might want to calculate travel expenses along organizational lines to see totals for each manager and department head, or you might want to see the totals for all employees that perform a certain job function, such as marketing, sales, product development, or executive staff. The members of the dimension remain the same (the individual employees), but they are organized into different hierarchies and therefore create different totals.
A number of books, journals, reports, and pieces of documentation describe multidimensional databases in depth. If you have purchased a multidimensional database, chances are the documentation that came with your database explains these concepts in much more detail than I have time to do here.
How the PivotTable Component Interacts with OLAP Data Sources
The PivotTable component communicates and interacts with OLAP data sources using a similar approach to tabular data sources. Figure 4-12 provides a general picture of the architecture.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-12. The interaction between the PivotTable component and an OLAP data source.
The PivotTable control uses the OLE DB for OLAP standard defined by Microsoft and supported by many multidimensional databases. This model is an extension of the OLE DB standard, so much of the PivotTable control's interaction with the OLAP data provider naturally is similar to how it interacts with tabular data providers. The control begins by connecting to the data provider, which again is an in-process COM component that resides on the client machine. The provider determines how it will communicate with the multidimensional database. For example, OLAP Servicesuses a TCP/IP socket connection between the client and the server.
After the PivotTable component is connected to the OLAP data source, it can display all the hierarchies and measures in the specified hypercube through the PivotTable Field List window. As the user drags and drops hierarchies and measures to the PivotTable control, or as the developer inserts hierarchies and measures programmatically, the PivotTable control generates the necessary queries in MDX (Multidimensional Expressions, which is the query language defined by the OLE DB for OLAP specification) and executes them against the data source. The data provider returns the results, and the PivotTable control displays them onscreen.
When working with an OLAP data source, the amount of data transmitted across the network is quite small. The OLAP provider typically sends only the MDX query string to the server, and the server returns only the cells and member names that you see onscreen. The server sends only the aggregate values back to the client, rather than all the underlying detail data that was necessary to create those aggregate values. This makes the PivotTable control very responsive and allows the system to scale to support a large number of simultaneous clients.
Should I Make a Cube or Just Group Tabular Data?
When I show people that the PivotTable component can group and total tabular data so that it looks like a report from an OLAP cube, they often ask, "So why should I make a cube?"
The answer is twofold. First, using a precreated, server-based hypercube often yields drastic improvements in performance over using the PivotTable control to create temporary cubes of tabular data. Every time you group a new field in the tabular data, the PivotTable control must re-create the cube and reprocess all the aggregates. A server-based cube processes the aggregates only once and shares them with all the clients accessing the cube.
Second, a precreated cube can define hierarchies with multiple levels that establish a clear drill-down path through the data. When the PivotTable control groups relational data, it creates hierarchies for date fields only; it cannot know that fields such as Country, State, and City are actually three levels of the same hierarchy. In a precreated cube, you can define these hierarchies and make it easier for users of the data to find the information they seek.
XML
One special data source for the PivotTable component is a URL that returns XML data in a specific format. In the ADO 2.1 release, the Data Access Group at Microsoft (the group that makes MDAC) defined an XML format for persisting an OLE DB Rowset. They also built a piece of data access plumbing called the persistence provider, which can save and load an OLE DB Rowset by writing and reading XML data in this format. The PivotTable control is capable of using this provider to load the Windows Cursor Engine with XML data returned from a specified URL. Figure 4-13 depicts the architecture for this scenario.
[url=http://dev.csdn.net/javascript:fullSize(]
Figure 4-13. Using the persistence provider to load the WCE with XML data.
In a moment, I will explain what type of connection string you must pass to the PivotTable control to use this approach. However, for purposes of the discussion at hand, the important piece of information the PivotTable control needs is the URL from which it should retrieve the XML data stream. The PivotTable control hands this URL to the persistence provider, which in turn uses the Internet services of Windows to request the results of that URL. The results are parsed and loaded into the WCE, and the PivotTable control continues on—just as it would when working with tabular data.
The format for this XML data is specific and unfortunately is not well documented. However, the easiest way to see what it looks like is to use the ADO Recordset object's Save method with the adPersistXML format to save the contents of a Recordset to a file. If you want to generate XML data in this format dynamically—for example, in a Microsoft Active Server Pages page—use the Recordset's Open method to test your output. If you can load your XML data into an ADO Recordset object, it will load into the PivotTable control because the control uses the same mechanism. For an example of generating XML data from an ASP page, see the source code for the solution discussed in Chapter 6.