Introduction to DataSets and working with XML files

数据集与XML文件的使用介绍

This article gives you an introduction to .NET's DataSets and how you can use them with XML files

这篇文章介绍了.NET数据集并教会你如何把它们跟XML文件结合起来使用

Introduction

介绍

This article gives you an introduction to using DataSets and how to use them with XML files. Working with them really makes your life easier when you want to share data from a data source and you are thinking of XML.

这篇文章介绍了.NET数据集并教会你如何把它们跟XML文件结合起来使用。当你想从数据源中共享数据而且想到使用XML的话，这将是很轻松的事情。

System Requirements

系统要求

To compile the solution you need to have Microsoft Visual Studio .NET installed. To run any of the client executable you need to have the .NET framework installed.

为了编译程序，你必须安装微软Visual Studio .NET。同时为了客户端程序的执行，.NET框架也是不可缺少的。

The sample provided is a simple application written in C#. It displays a form with a DataGrid. By default, the application connects to a SQL Server, binds the Northwind database and fetches some parent records from table Customers and some child records from table Orders. By default, the application assumes that you have an instance of SQL Server installed on your machine. If this is not the case, then you must manually modify the connection string and then rebuild the sample.

这里提供的是一个用C#编写的小程序。它用了一个DataGrid来显示数据内容。默认情况下，程序链接到SQL Server服务器，帮定Northwind数据库，从Customers表取出主表数据，从Orders表取从表数据。这里假定你已经安装了SQL Server数据库服务器。如果不这样的话，你必须手动修改连接字符串并重新建立。

Then, you can save the dataset to XML file. Schema information is also saved.

然后你就可以把数据集保存到XML文件了，架构信息也会被保存的。

What is a DataSet?

什么是数据集？

A DataSet object is a very generic object that can store cached data from a database in a very efficient way. It is member of the System::Data namespace.

一个数据集对象是一个能够以高效方来存储来自数据库高速缓冲数据的通用对象。它是System::Data命名空间的一个成员。

One obvious question is: When to use a dataset? Well, the answer is: it depends. You should consider that a dataset is a collection of in-memory cached data. So it's good to use datasets when:

一个明显的问题是：什么时候使用数据集呢？答案是这样的：它取决于。你应该明白数据集存储的是内存缓冲器里的数据。所以这几个情况下使用数据集是最好的：

You are working with multiple separated tables or tables from different data sources.

You are exchanging data with another application such as a Web Service.

You perform extensive processing with the records in the database. If you use a SQL query every time you need to change something, processing each record may result in connection being held open which may affect performance.

You want to perform XML/XSLT operations on the data.

l 你同时使用多个独立的数据表或数据表来自不同的数据源。

l 你要作类似于Web Service这样的数据交换。

l 你要处理数据库重大量的记录。如果你每次都使用一条SQL查询来操作数据库，这样将影响性能。

l 你想使用XML/XSLT对数据进行操作。

You should not use a dataset if:

如果在这种情况下，你就应该使用数据集：

You are using Web Forms in your application because Web Forms and their controls are recreated each time a page is requested by the client. Thus, creating, filling and destroying a dataset each time will be inefficient unless you plan to cache it between roundtrips.

你在程序使用Web窗体，而Web窗体和它们的控件每次在页面生成的时候都会在客户端重新建立。这样，你要是不打算把数据集放在缓冲里，每次建立、填充和销毁数据集都会影响效率。

A DataSet has a DataTableCollection object as a member that is nothing but a collection of DataTable objects. You declare a DataSet object and add tables to it like this (in Managed C++):

数据集有存放数据表集合的DataTableCollection对象。你可以声明一个数据集对象并像下面这样把表添加进去。

// Declare the DataSet object

DataSet* MyDataSet = new DataSet ("MyDataSet"); // give it a name here

// Add two tables to it

// - add a Table named Table1

DataTable* Table1 = MyDataSet->Tables->Add ("Table1");

// - add a Table named Table2

DataTable* Table2 = MyDataSet->Tables->Add ("Table2");

You can refer the tables later either using the pointers returned by the Add method or like this:

也可以用Add方法或这样返回的结果来引用数据表：

DataTable* table = MyDataSet->Tables->Item[0]; // or

DataTable* table = MyDataSet->Tables->Item["Table1"];

// isn't this indexation cool?

A DataTable object has two important members: Rows and Columns. Rows is a DataRowCollection object and Columns is a DataColumnCollection. DataRowCollection is a collection of DataRow objects and DataColumnCollection is a collection of DataColumn objects. I am sure you can easily figure out what these objects represent. :)

数据表对象有两个重要成员：行和列（Rows/Columns）。Rows是一个DataRowCollection对象，而Columns是一个DataColumnColletion对象。DataRowCollection与DataColunmCollection分别是DataRow与DataColumn对象集合。相信你能够轻松的理解这些对象的含义。J

Adding data to a data set is straight-forward:

单向填充数据

// adding data to the first table in the DataSet

DataTable* Table1 = MyDataSet->Tables->Item[0];

// add two columns to the table

Table1->Columns->Add ("Column1");

Table2->Columns->Add ("Column2");

// get the collection of rows

DataRowCollection* drc = Table1->Rows;

// create a vector of Objects that we will insert in current row

Object* obj[] = new Object* [2];

obj[0] = new String ("Item 1");

obj[1] = new String ("Item 2");

// add them to the dataset

drc->Add (obj);

If you want to specify the data type of a particular column you should use the DataColumn::DataType property. You can set any of the following data types: Boolean, Byte, Char, DateTime, Decimal, Double, Int16, Int32, Int64, SByte, Single, String, TimeSpan, UInt16, UInt32, UInt64.

如果你想指定某列的数据类型，可以使用DataColumn::DataType。你也可以设置下面这些数据类型：Boolean、Byte、Char、DataTime、Decimal、Double、Int16、Int32、Int64、SByte、Single、String、TimeSpan、UInt16、UInt32、UInt64。

That's it! Well, that's all you have to do if you want to build up your data set manually. This is not how it is done if you want to connect to a real data source.

就这些内容了，如果你想手动建立数据集，这就是你需要知道的。如果你要连到一个真正的数据源的话，就不是这样了。

Binding a Database

帮定一个数据库

To connect to a database server (such as SQL Server) and fill a dataset from there you need three additonal objects: a Connection, a DataCommand and a DataAdapter object.

要连接到数据库服务器（例如SQL Server）并要填充数据集，你需要三个额外的对象：Connection、DataCommand、DataAdapter。

The Connection object is used to connect to the database object. You must provide a connection string to it. Also, if your application is using transactions, you must attach a transaction object to the connection.

Connection用于连接数据库对象。你必须为它提供一个连接字符串。如果你的程序使用事务的话，你也得把事务对象附加给连接。

The DataCommand object is used to sending commands to the database server. It includes four System::String objects named SelectCommand, InsertCommand, UpdateCommand and DeleteCommand. I believe it is obvious what these string objects represent, nothing else but the four basic SQL operations.

DataCommand对象用于给数据库服务器发送命令。它包含四个System::String类型的对象：SelectCommand、InsertCommand、UpdateCommand、DeleteCommand。它们的含义很明显，就是四个基本的SQL操作。

The DataAdapter object is the object that does all the magic work. It fills the DataSet with data from the database and updates data from the DataSet to the database.

DataAdapter对象很奇特。它能从数据库取出数据并填充数据集，还可以把数据集更新的内容反填到数据库。

You may now wonder what are the classes that correspond to the objects described above. Well, Microsoft have prepared two sets of classes - you can't say life is getting much easier in .NET can you? :). The first set is based on OLE DB and is part of the System::Data::OleDb namespace. It contains the following classes: OleDbConnection, OleDbCommand and OleDbDataAdapter. The other set of classes is optimized for working with Microsoft SQL Server. It is part of the System::Data::SqlClient namespace and its classes include: SqlConnection, SqlCommand and SqlDataAdapter.

你现在可能想知道对应上述描述的类是什么了吧？微软给我们准备了两组类——你不得不说.NET时代的生活都是那么的容易，不是吗？J 第一组是基于OLE DB的，同时也是System::Data::OleDb命名空间的一部分。它包含下面几个类：OleDbConnection、OleDbCommand和OleDbDataAdapter。另外一组是为微软SQL Server优化的类。它是System::Data::SqlClient命名空间的一部分，它的类包含：SqlConnection、SqlCommand和SqlDataAdapter。

Here is a complete example of how to connect to the database and fill the dataset. I have used the classes optimized for SQL Server. If you need to use OLE DB you only have to replace "Sql" with "OleDb". We try to fetch two tables Table1 and Table2 and set a parent-child relationship between them.

这里给出了如何连接数据库并填充数据集的完全代码。我使用了为SQL Server优化的类。如果你想使用OLE DB的话，只要把“Sql”替换成“OleDb”就可以了。我们试着取出Table1和Table2两个表并在它们之间建立一个主从关系。

// Create a database connection string

String* str = new String ("user id=sa;password=;initial catalog=MyDB;"

"data source=(local)");

// Create the database connection object

SqlConnection* sqlcon = new SqlConnection (str);

sqlcon->Open (); // open it

// create the first SQL query

String* strTable1 = String::Format ("SELECT * FROM Table1 "

"WHERE Field1 = {0}", FieldID.ToString ()); // FieldID is

// an interger used to filter our query.

// create the second SQL query. It joins the first table to select only those

// fields in relation

String* strTable2 = String::Format ("SELECT T2.* FROM Table2 T2 "

"INNER JOIN Table1 T1 ON T2.ParendID = T1.ID "

"WHERE T1.Field1 = {0}", Field1.ToString ()); // FieldID is

// an interger used to filter our query.

// create the SQL command objects. We pass in the contructor the

// SqlConnection object and the query string

SqlCommand* sqlTable1 = new SqlCommand (strTable1, sqlcon);

SqlCommand* sqlTable2 = new SqlCommand (strTable2, sqlcon);

// Create a data adapter for every table. We pass the SqlComand objects as parameters

SqlDataAdapter* Table1Adapter = new SqlDataAdapter (sqlTable1);

SqlDataAdapter* Table2Adapter = new SqlDataAdapter (sqlTable2);

// now we create the dataset object and we give it a name

DataSet* MyDataSet = new DataSet ("MyDataSet");

// we inform the dataset object about the tables it is going to contain

// by adding those tables to the dataset.

DataTable* Table1 = BackupDataSet->Tables->Add ("Table1");

DataTable* Table2 = BackupDataSet->Tables->Add ("Table2");

// now we are filling the Dataset using the Dataadapter objects

// We need not say anything to the DataSet object about the

// columns of the table and their data type. The DataAdapter objects

// takes care of everything.

Table1Adapter->Fill (Table1);

Table2Adapter->Fill (Table2);

// To ensure relationships between Tables we must add a DataRelation object

// We assume that between column 0 in Table1 and column 1 in Table2 there is

// a one-to-many relationship

MyDataSet->Relations->Add (Table1->Columns->Item[0],

Table2->Columns->Item[1]);

For details about relations and constraints you should read about the Constraint class and the two classes derived from it, ForeignKeyConstraint and UniqueConstraint.

关于关系和约束你可以参考Constraint类和从它继承而来的ForeignKeyConstraint和UniqueConstraint类。

Working with XML files

使用XML文件

DataSets can work with XML files very easily. There are two methods to serialize a DataSet object. These are DataSet::WriteXml and DataSet::WriteXmlSchema. The first one writes data to the XML file and may include also schema information. It is useful when you want to write an XML file that has schema information embedded. However, if you prefer schema information in a separate (.xsd) file you should use the DataSet::WriteXmlSchema method.

数据集跟XML很容易工作。有两个方法？？？？？？数据集对象。DataSet::WriteXml和DataSet::WriteXmlSchema。第一个是往XML文件里写数据，也可以包含架构信息。但是，如果想把架构信息存储在一个独立的文件（.xsd）中，你可以使用DataSet::WriteXmlSchema方法。

There are also plenty of classes in the System::Xml namespace: XmlWriter, XmlReader, XmlTextWriter, XmlDataDocument to name a few. You can use those with a dataset to perform some advanced XML operations. For instance, if you want to write a dataset to an XML file you can either use

System::Xml命名空间有许多类：XmlWriter、XmlReader、XmlTextWriter、XmlDataDocument等。你可以用这些配合数据集执行一些高级XML操作。例如，如果你想把数据集写入XML文件，也可以这样用

// recevies a DataSet in constructor

XmlDataDocument* xmlDoc = new XmlDataDocument(MyDataSet);

XmlTextWriter* Output = new XmlTextWriter ("C:\\Myfile.xml", NULL);

// perform some formatting

Output->Formatting = Formatting::Indented;

Output->Indentation = 2;

// and write it

xmlDoc->WriteTo (Output);

Output->Close ();

or use the DataSet::WriteXml method directly:

也可以直接使用DataSet::WriteXml

MyDataSet->WriteXml ("C:\\MyFile.xml", XmlWriteMode::WriteSchema);

In the latter situation I chose to embed schema information in the file by using one member of the XmlWriteMode enumeration. Other fields of the enumeration are XmlWriteMode::IgnoreSchema if you do not want to include schema information, XmlWriteMode::DiffGram if you want to include both original values of the dataset and any changes. For reading an XML file we use another enumeration: XmlReadMode.

稍后我用XmlWriteMode一个成员在文件中嵌入架构信息。不想包含架构信息的话可以用XmlWriteMode::IgnoreSchema，包含数据集原始值和变更值可以使用XmlWriteMode::DiffGram。读取XML文件要使用XmlReadMode。

A DataRelation has a property named Nested. When this property is set to true then every time the DataSet will write a record of a parent table, it will also nest all corresponding records in all child tables.

数据关系有一个Nested属性。当它的值是true的时候，数据集每次写主表记录的同时，也会嵌入所有的从表对应的记录。

Formatting data in an XML file is very flexible. By default, a new element is created for every column in every table. Assuming you have a table named Table1 with two columns named ID and Name the default output of an XML file will be:

XML文件中格式化数据很灵活的。一般情况下，每个表里的每一列都被建立一个对应的元素。假定你有一个名为Table1的表，其中有ID和Name列，默认输出的XML显示如下：

</Table1>

</Table1>

</MyDataSet>

If one column is to become an attribute of the node then you will set ColumnMapping property of DataColumn class. For that you must have a look at the MappingType enumeration. Its fields are: Attribute, Element, Hidden and SimpleContent. Choosing Attribute will write the corresponding column as an attribute of the parent node. The output will be like this:

如果一列要作为一个节点的属性，你需要设置一下DataColumn类里的ColumnMapping属性。事先你应该看一下MappingType。它的域值是：Attribute、Element、Hidden、SimpleContent。选择属性的话，会把该列作为父节点的一个属性。输出就像下面这样：

</Table1>

</Table1>

</MyDataSet>

SimpleContent means that tags for one column will not be written. If you chose Hidden then that column will not be written at all.

SimpleContent意味着那列的标记不会写出，只有值而已。如果你选了Hidden，列就会被完全忽略。

Of course you can combine them very easily. Doing this:

当然，你可以很容易的把它们组合起来。这样做：

Table1->Columns->Item[0]->ColumnMapping = MappingType::Attribute;

Table2->Columns->Item[1]->ColumnMapping = MappingType::SimpleContent;

will give you the following results:

结果如下：

</MyDataSet>

Reading from an xml file is just as easy as writing. The simplest options is to use the ReadXml method like this:

从Xml文件读取跟写是一样容易的。最简单的选择是用ReadXml方法：

MyDataSet->ReadXml ("C:\\MyFile.xml", XmlReadMode::ReadSchema);

I have read also schema information from the file. This means that the DataSet will automatically detect data type for every column in all tables in the dataset and also any constraints or relations between tables. This is really cool if you want to update a dataset. You change a value in the parent table and all child tables will have the parent value updated. Also, trying to change a value in a child table that does not have a parent value will throw an exception.

我从文件中读取了架构信息。这意味着DataSet会自动发现数据集中的每个表的列的数据类型和表间的约束与关系。如果你想更新数据集的话，这个真的是很酷。你可以更改主表的一个值，所有的从表也会更新。同样，如果把从表的值修改为一个对应的主表中不存在的值将拋出异常。

Updating the Database

更新数据库

Reading data into a dataset and then updating a database is just as easy as reading from a data source and filling a dataset. Assuming you are reading from an XML file and you are updating a SQL server, you must do the following:

把数据读入数据集并更新数据库就跟从数据源读取和填充数据集一样简单。假设你从XML文件读取数据，并更新一个SQL Server，你必须执行以下步骤：

Create the DataSet object and read the XML file. If your XML file contains any schema information then the DataSet will detect and create the corresponding tables and enable any constraints automatically.

Create a DataAdapter object for every table you want to update.

Call the Update method for every DataAdapter object.

l 建立DataSet对象并读取XML文件。如果你的XML文件包含架构信息的话，DataSet会自动发现并创建对应的表，生成相应的约束。

l 为你想要更新的每个表都创建一个DataAdapter对象。

l 为每个DataAdapter对象调用Update方法。

Remember that I mentioned four String members of the SqlDataAdapter class? These are: SelectCommand, InsertCommand, UpdateCommand and DeleteCommand. SelectCommand is the query used to fetch data from the database. You can define the other three objects if you want to perform custom update/insert/delete operations. If you do not want to do that you can use either SqlCommandBuilder or OleDbCommandBuilder class. This class will build those strings automatically.

还记得我提到的SqlDataAdapter类里的四个字符串成员吗？SelectCommand、InsertCommand、UpdateCommand和DeleteCommand。SelectCommand是用于从数据库提取数据的查询。如果你想执行自己的update/insert/delete操作的话，你可以定义其他三个对象。如果不想的话也可以使用SqlCommandBuilder或是OleDbCommandBuilder类。这个类会自动创建这些字符串。

Whenever you are writing data to a database is good practice to use a transaction to prevent concurrent writes to the database. For this. the .NET framework provides two classes: OleDbTransaction and SqlTransaction respectively.

不论你什么时候把数据写入数据库里，用事务来防止同时写入数据库是一个很好的办法。对于这个，.NET框架提供了两个类：OleDbTransaction和SqlTransaction。

Here is a sample of reading data from an XML file and the writing to a SQL Server.

这是一个从XML文件读入数据然后写入数据库的例子。

SqlTransaction* SqlTrans; // declare a transaction object

try

{

String* str = new String ("user id=sa;password=;initial catalog=MyDB;"

"data source=(local)");

// Create the database connection object

SqlConnection* sqlcon = new SqlConnection (str);

sqlcon->Open (); // open it

// create the data set object and give it a name

DataSet* MyDataSet = new DataSet ("MyDataSet");

// read the XML file

// I have also read the schema information file

MyDataSet->ReadXml ("C:\\MyXmlFile.xml", XmlReadMode::ReadSchema);

// Begin the transaction

SqlTransaction = sqlcon->BeginTransaction ();

// create the data adapters

SqlDataAdapter* Table1Adapter = new SqlDataAdapter("SELECT * FROM Table1", sqlcon);

SqlDataAdapter* Table2Adapter = new SqlDataAdapter("SELECT * FROM Table2", sqlcon);

// we have provided only the SelectCommand strings. To update

// the database we must provide the DeleteCommand, InsertCommand and

// UpdateCommand also.

// This can be done automatically with the command builder

// create the command builders for each data adapter

SqlCommandBuilder* Table1Command = new SqlCommandBuilder (Table1Adapter);

SqlCommandBuilder* Table2Command = new SqlCommandBuilder (Table2Adapter);

// we must specify the transaction used by these adapter.

Table1Adapter->SelectCommand->Transaction = SqlTrans;

Table2Adapter->SelectCommand->Transaction = SqlTrans;

// update the database

Table1Adapter->Update (MyDataSet, "Table1");

Table2Adapter->Update (MyDataSet, "Table2");

// don't forget to commit

SqlTrans->Commit ();

}

catch (Exception* e)

{

// if we have started the transaction then rollback it

if (SqlTrans != NULL)

SqlTrans->Rollback();

}

Conclusion

结论

DataSets provide a very easy to use and powerful way to handle large amounts of data coming from different tables or even data sources. A dataset will cache all data making it useful when you need to read data, perform intensive operations with it and then update it. It also provides full XML support making your life easier to share data across applications such as Web Services.

数据集提供了一个方便实用功能强大的方法来操作大量来源于不同表甚至是不同数据源的数据。一个数据集缓冲区能存储大量数据，你可以进行读取、操作和更新。它还提供了完全的XML支持，让你更加容易的实现如同Web Service那样的跨平台数据共享。