Data on the Outside vs. Data on the Inside
微软msdn上面的文章
An Examination of the Impact of Service Oriented Architectures on Data
By Pat Helland
Microsoft Corporation
Summary: Pat Helland explores Service Oriented Architecture, and the differences between data inside and data outside the service boundary. Additionally, he examines the strengths and weaknesses of objects, SQL, and XML as different representations of data, and compares and contrasts these models. (26 printed pages)
总述:Pat Helland探询SOA以及服务的内外数据。他还调查了作为数据表现的不同形式的对象,sql,xml的有缺点,比较这些模型。
Contents
Assumptions About Service Oriented Architecture
Outside Data: Sending Messages
Representations of Data: Inside and Outside
Introduction
Up until now, most of the discussions on Service Oriented Architecture (SOA) revolved around topics about integration of disparate systems, leveraging companies existing assets, or creating a robust architecture. All of these issues are relevant to SOA. Yet, there are other significant and engaging issues involving SOA that are worth close attention. In its goal to connect heterogeneous and autonomous systems, SOA adheres to several core design principles. One of the principles maintains that independent services involve pieces of code and data interconnected through messaging.
到现在为止,大部分关于SOA讨论的topic都是关于异构系统的集成,有效利用公司先存资产【译者:遗留系统的利用】以及创建robust的架构。所有这些都是和SOA相关的。然而,还有其他关于SOA非常有意义的,很吸引人的方面值得关注的.在SOA连接异构自治系统的目标中,SOA坚持几个核心设计原则。其中原则之一就是通过消息传送来维护独立的服务,包括代码和交互的数据。
Indeed, services are inextricably tied to messaging in that the only way into and out of a service are through messages. However, services still operate independently of each other. Because of the unique relationship between services and messages, architects, developers, and programmers alike began asking critical questions. Some of the questions deliberated on were how does data flow between services, how are messages defined, what data is shared, how is data inside of a service different from data outside a service, and how is data represented inside and outside services.
实际上,服务是不可避免的和消息传送绑定在一起的,因为传入传出服务的唯一方法就是通过消息。然而,服务仍然独立于彼此运行。因为服务和消息之间的唯一的联系方式,架构师,开发者,程序员都类似的开始问关键的问题。一些疑问是关于服务间数据如何流动,消息如何定义,那些数据是共享的,服务内部的数据和服务外部的数据区别,服务内外部的数据如何表示。
Findings to these questions exposed seminal differences between data on the inside of a service and data that existed outside of the service boundary. Data outside a service is sent between services as messages and must be defined in a way understandable to both the sending service and the receiving service. Data inside a service is deeply rooted in its environment. Unlike data outside services, data on the inside is private to the service. In fact, it is only loosely correlated to the data on the outside.
通过这些问题,可以发现服务内部的数据和存在于服务外部边界外部的数据之间的本质却别。服务外部的数据作为消息在服务间传送,必须以一种发送方服务和接受方服务都可以理解的方式定义。服务内部的数据是和他的环境紧密相关的。不象服务外部的数据,服务内部数据是服务的私有数据。实际上 ,它只是和服务外部数据松散联系。
In response to the above findings, this paper leads readers into an in depth discussion on data inside services and data outside services. Readers are introduced to different kinds of data outside services including immutable, versioned, and reference data. The discussion then turns to data inside services involving messages (operators and responses), reference data, and service-private data. Next, the temporal interpretation of data inside services and outside services is explored. Once the different kinds of data are identified, attention is given to the representation of data through an examination of three critical models: XML, SQL, and objects.
对于上述发现的回答上,本文引导作者数据的讨论服务内外部数据。向读者介绍不同种类的服务外部数据,包括不可变数据,加了版本的数据【译者:给数据加上时间戳】,以及引用数据。然后飨读者介绍的服务内部数据包括消息(操作符和响应),引用数据,和服务私有数据。下一步,是关于服务内外部数据的短暂研究。不同种类的数据确定后,就开始关注数据表示,通过检验三种关键的模型来进行:XML,SQL,对象。
Although SOA promises to continue stimulating conversation across enterprises and in the IT industry, the buzz accompanying it may now be about data inside services and data outside services. There is now a strong momentum for enterprises to not only bring SOA into their environments, but also to achieve a deeper understanding of their services and the behavior between services and data.
尽管SOA保证要持续的支持( stimulating )跨企业的交流,但是现在大家讨论的主要问题可能就是关于服务内外部的数据。现在企业有强烈的动力不仅要将SOA带入他们的环境,而且要进一步理解他们的服务和服务和数据间的行为。
The Shift Towards Services
向服务迁移
One issue in SOA is on independent services involving pieces of code and data, and message interconnecting services. Each service is a unique collection of code and data that stands alone and is independent of other services. However, each service is also interconnected with other services through messaging. The latter differentiates the services from the silos existing in many environments.
SOA中的一个事情就是关于独立的服务,包括代码/数据片断和连接服务的消息。每个服务都是唯一独立的一个代码和数据的集合,并且和其他服务是不依赖的。然而,每个服务都通过传送消息来与其他服务相互连接。 The latter differentiates the services from the silos existing in many environments.
Messaging carries enormous importance in SOA. Messages are sent between services and float between them. The schema definition for each message and the contract defining the flow of the messages specify the "black box" behavior of the service. Services are inextricably tied to messaging in that the only way into and out of a service are through messages. A partner service is only aware of the sequencing of the messages flowing back and forth.
SOA中,消息传送承担着巨大的作用。消息在服务间传送,在他们之间流动。每个消息的schema定义和定义消息流的契约(contract)指定乐服务的“黑盒”行为。服务盒消息是绑定在一起的,因为消息是唯一的传出传出服务的方式。伙伴服务(A partner service )是唯一清楚消息来回流动顺序的。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig1.gif
Figure 1. Services and messages are tied together
图1 服务和消息绑定在一起
Sometimes many related messages are sent between two different services. Related messages can flow between two services over the course of weeks or months. For example, an individual may reserve a train ticket on June 1, 2004 and then change the ticket to a different date on June 5, 2004. The individual may then confirm and pay for the reservation on June 10, 2004 and finally cancel the ticket on June 25, 2004. In this example, the individual sent messages every few days for a number of weeks. This is referred to as long-running work. Messages in long-running work are related, with the second message dependant on the first message and the third message dependant on the first two messages. A cookie or something similar is used to correlate the relationship between the messages in a long-running work. We avoid the phrase long-running transaction to eliminate confusion with the atomic database transactions. In addition, the word transaction suggests an activity with a beginning and an end. Most long running work impacts other applications in ways that ripple through multiple businesses without a clear boundary of where the piece of work ended and another started.
又是,许多相关的消息在两个不动的服务间传送,相关的消息能在两个服务间传送经过数周或者数月的过程。例如,一个人有一张2004年6月1号的火车票,然后更改日期为2004年6月10号,并且最终在2004年6月25号取消了这个车票。在这个例子中,这个人每隔若干天发送消息。这是一个时间跨度很长的事情。这个长时间的事情中,涉及到消息。第二个消息依赖第一个消息,第三个消息依赖第一个消息。cookie或者类似的东西用来联系时间跨度长的消息。我们避免长时间的事务操作,来取出数据库原子事务操作的混乱【译者:如此长的时间跨度,不能用事务来处理,谁用事务谁是fool(限这种情况)】。另外,事务这个词意思是一个具有开始和结束的活动。大多数长时间跨度的工作通过下面方式影响其他应用,它与多个业务有关系,没有明显的工作结束,或者另一个工作开始的界限。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig2.gif
Figure 2. Multiple related messages
图2 多个相关联的消息
To summarize, messages are an important and a critical part of SOA. As the paper will show, messages must receive special care to ensure correct interpretation and to avoid confusion as they flow between different services.
总之,消息是非常重要的,而且是SOA的主要部分。这片文章将显示,消息必须接受特殊的注意来保证正确地解释啸傲西,避免他们在不同服务见流动是的混乱。
Services vs. Components
服务 vs 组建
There has been a natural evolution over the years involving functions, objects and components, and services. In the beginning, code was separated into functions that allowed software to be grouped into smaller and better-organized pieces. Later, components and objects evolved allowing for the encapsulation of data (member variables) within them.
函数,对象,组建以及服务是经过多年的发展的一个自然的进程。开始的时候,代码被划分为函数,使软件能够使用更小的更好组织的代码片断来组织。后来,组件盒对象发掌起来,使数据可以封装在里面。
Currently, services have taken center stage in the evolution process. Services provide a coarser grained form of separation with more independence between the pieces of code than functions and components. First, services always interact with each other through messages. Second, they are normally durable allowing them to survive system failures and restarts. Finally, services have opaque implementation environments where only the messaging interactions are visible from the outside.
现在,服务已经登上的发展的舞台。服务提供粗粒度的形式,分割为比函数和组件更具有独立性的代码。首先,服务之间总是通过消息互相通信。其次,服务使系统应用环境不受限制,只有外部消息交互作用。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig3.gif
Figure 3. Services versus components
图3 消息 vs 组建
Considering Inside and Outside Data
研究内外部数据
It is important to make a distinction between the data inside services and the data outside services. Data outside a service is sent between services as messages and must be defined in a way understandable to both the sending service and the receiving service. Data inside a service is deeply rooted in its environment.
区分服务内部外部数据是非常重要的,服务外部数据在服务间以消息方式传送,而且必须以一种可以使服务双方都可以理解的方式定义。服务内部数据植根于服务内部环境。
The need to interpret the data in at least two different services makes the existence and availability of a common schema imperative. The schema should also have certain characteristics. First, independent schema definition is important. This means the sender or receiver should be able to define message schemas without having to consult each other. Second, the message schema should be extensible. Extensibility allows the sending service to add additional information to the message beyond what is specified in the schema.
只要需要解释两个不同服务的数据,使一个common的格式命令成为必须。这个格式必须有一定的特点。首先独立的格式定义是重要的。这意味着发送这盒接收者应该能够定义消息格式,而不用咨询对方。其次,消息格式应该可以扩展。扩展性使服务发送者能够给发送的消息增加格式没有定义的信息。
Note The sender of the message may or may not be the definer of the schema for the message.
注意 服务发送者可以是也可以不是这个消息格式的定义者。
Unlike data outside services, data on the inside is private to the service. In fact, it is only loosely correlated to the data on the outside. Data on the inside is always encapsulated by service code so the only way of accessing it is through the business logic of the service.
不像服务外部数据,服务内部数据是服务私有的。实际上,它只是和服务外部数据松散联系。内部数据总是封装在服务代码中,这样,唯一能都访问他们的方法是通过服务的业务逻辑。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig4.gif
Figure 4. Data inside and data outside services
图4 服务内部外部数据
Mainframes and monoliths—All About Data
主要框架盒整体结构-都是关于数据的
In the past, it was typical for a mainframe or other server system to support multiple applications. Applications access a shared database and work on either a shared set of tables or different tables within the same database. Since all the tables are in a shared database on a large server, applications can perform a single transaction that accesses data from many different tables. Likewise, operations updating multiple tables can take advantage of transactional atomicity in their changes. Equally important, not only do the applications have access to all the data in the database, but they can also access tables managed by different applications. This has colored how people think about the relationship across applications since applications have immediate access to the latest and usually, most accurate information. While this type of access is contingent on such measures as security, authorization, and privacy concerns, most applications assume they can simply look in order to see the latest information.
过去,主机或者其他服务器系统支持多个应用是很典型的。应用访问共享数据库并且对共享的同一个数据库中的数据表或者不同的数据表进行操作。既然,多有的表都在一个共享的大型的服务器的数据库中,应用能用执行一个执行唯一个事务操作,该事务访问许多不同的数据表。同样的,更行多个数据表的操作能够利用事务的原子性的优点。同样重要的是,不仅应用能够访问数据库中所有数据,他们还能都访问被除自身之外的应用管理的表。既然多个应用能够立即获得最新的,最准确的信息,这给人们关于多个应用之间的关系留下了深刻印象。然而这种类型的访问要依靠对安全性,授权,私有程度的衡量来确定,许多应用家丁,他们他么那能够查看最新的数据。
In recent years, various economic and technological trends have resulted in applications steadily moving off to different machines. This has resulted in the fracturing of the monolith. A single giant system no longer runs all the applications in a typical organization. This, however, raises an issue because as applications move to different machines, it becomes more difficult to share the same data since data now resides on different machines. Updating across these machines also becomes difficult since the machines are designed to be independent and, typically, do not trust each other.
最近几年,不同的经济和技术却是导致了应用稳步的分散到各个不同的机器上去。这也导致了整体结构的破裂。在一个典型的组织总,一个单独的大型系统不再运行所有的应用。这导致出现了如下状况的出现,既然不同的应用在不同的机器上,纳闷他们之间共享驻留在不同机器上的相同数据就更加困难了。跨越这些所有机器的更新就变革更加困难了,应为所有的机器都设计为独立的,而且一般不信任彼此。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig5.gif
Figure 5. Mainframes and monoliths
图 5. Mainframes and monoliths
Assumptions About Service Oriented Architecture
关于SOA的设想
Major Tenets of Service Oriented Architecture
SOA的主要原则
Up to this point, there has been discussion on code and data, systems, and messages. Like any other subject matter that is under discussion or deserves writing down and sharing with others, there exist beliefs about the subject. Below is an outline of the four major tenets of Service Oriented Architecture that detail the existence of code, message format and content, the function of a service, and service compatibility.
到这里,已经讨论了代码,数据,系统和消息。就像其他在讨论的,或者值得记录与其他人分享的主题一样,有一些关于这个主题的想法( Like any other subject matter that is under discussion or deserves writing down and sharing with others, there exist beliefs about the subject.)。下面是一个SOA的四个主要原则的框架,详述了代码,消息格式,内容,服务功能和和服务的兼容性。
1 Boundaries are explicit. This means there is no ambiguity about where each part of the code exists. Specifically, it is clear if the code resides inside or outside of the service. The same belief exists for data. It is known if a database table lives inside or outside the service.
1 边界清楚 意思是不存在每部分代码放在哪里的模糊。特别的,要清楚代码是驻留在服务内还是服务外。同样的原则适用于数据。必须清楚数据库表是驻留在服务内还是服务外。
2 Services are autonomous. This means each service has its own implementation, deployment, and operational environment. Therefore, a service can be rewritten without its partners being negatively impacted just as long as the correct message continues to be sent at the correct time.
2 服务自治 意思是每个服务有它自己的实现,部署,操作环境。这样,一个服务可以不对他的partnent产生负面影响的情况下 重写,只要继续在正确时间发送正确的消息。
3 Services share schema and contract, but do not share implementation. Schema describes the format and the content of the messages while contract describes message sequences allowed in and out of the service. What is not known is how implementation takes place. Consider the use of an Automated Teller Machine (ATM). Most people know how to interact with these machines. They know what buttons to press and they know the outcome. For example, John enters a pin number and then presses some buttons. Most people suspect a computer is involved in the entire process. However, they are typically unaware of how it is all implemented.
3 服务共享schema和契约,但不共享实现。schema描述消息的格式个内容,然而契约(contact)描述消息进出服务的顺序。不知道的是如何实现的。考虑一下ATM机的使用,大多数人知道怎样与这些机器打交道,他们知道该该按哪个键有什么结果。例如,john输入一串号码,然后按一下键。大多数人认为电脑参与了整个过程。然而,他们不知道怎样实现它。【译者:按照什么顺序去按键取钱,就是contact】
4 Service compatibility is based on a policy. Formal criteria exist for getting "service from the service." The criteria are located in an English document that outlines the rules for using the service. Currently, WS-Policy is working on formalizing a declarative and programmatic way to express the policy requirements.
4 服务兼容性是基于策略的 存在正是的标准来定义 “service from the service”。标准是一个英文文档,它指定了使用服务的框架。当前,WS-Policy是正在使用的标准化声明和编程方式来表达策略的需要。
These are the basic principles of SOA and form the basis for the relationship between services.
有基本SOA原则并且形成了服务之间关系的基础。
Challenges with SOA
挑战SOA
No matter what technology is brought on board to deal with the plethora of complex IT systems that makeup today's enterprises, its beliefs and capabilities will be continuously challenged. In this section, attention is given to some of the existing challenges experienced by Service Oriented Architecture.
无论任何技术出现来处理现在IT系统的复杂性来补充现有的企业的时候,对它的信心和能力都将收到持续的挑战。这一节,将关注一些现有的SOA经历的挑战。
To date, two of the largest challenges experienced by SOA deal with explicit boundaries and autonomy. Explicit boundaries crisply define what is inside a service and what is outside a service. A service is comprised of code and data. Unlike functions and components, the code and data of different services are disjoint and data from one service is kept private from the data of another service. The disjoint collections of code and data reside within explicit boundaries called services.
到目前为止,SOA两个最大的挑战是处理显示的界限和自治。清楚的界限定义了那些在服务内,哪些在服务外。服务由代码和数据组成。不像函数和组件,不同服务的代码和数据是不想关的一个服务的数据相对于另一个服务的数据是私有的。这些相互无关的代码和数据的集合,驻留在显示的界限中,也就是服务。
Autonomy speaks to the independence of services from each other. For example, Service-A is always independent of Service-B. As long as the schema and contract are maintained, no adverse impact is expected. As a result, each service is free to be recoded, redeployed, or completely replaced independent of the other service.
自治是指各个服务是彼此独立的。例如,服务A总是独立与服务B,只要schema和contract维护好,不会出现不利的影响。这样的结果是,每个服务都可以自由的重新编码,重新部署,或者彻底的以其他的服务代替。
Even with autonomy and explicit boundaries, there are often other complications such as trust issues across boundaries. There needs to be trust between services at all times. To achieve this, a service must first decide on what kind of trust it wants and second, define its own style of trust. It is common for the rules that define trust to be modeled after real interactions across businesses. After all, it is issues such as credit card fraud that made everyone think about trust in the first place.
即使有自治性和显示的界限,仍然有其他复杂性,如跨边界的可信任性【译者:应该是指授权】.服务之间总是需要项目信任。为了达到这点,服务必须首先决定它需要哪一种信任,然后它自己拥有哪种类型的信任。在跨业务的实际交互之前模型化信任是普遍原则。毕竟,现在有信用卡诈骗使每个人首先想到信任性。
The Debate About Transactions
事务的争论
Along with the beliefs and challenges that follow a technology, there are also debates. No matter where you turn there always seems to be a debate flourishing around some technology. Where SOA is concerned, one important debate is about transactions. On one side of the debate, people propose that atomic (ACID) transactions, perhaps implemented with 2-phase commit, be used across service boundaries.
伴随着对一项技术的信任和挑战,总是有争论,不管你看哪些技术,好像总是存在着大量的争论。SOA相关的,一个重要的争论使事务。争论的一方,人们建议ACID事务采用两段提交,用在跨服务的情况。
Note WS-Transaction is currently involved with defining transactions that span service boundaries.
注意 WS-Transaction现在正在着手定义跨服务界限的事务。
On the other side of the debate, people believe a service should never hold locks for other services because this involves a great deal of trust that the transaction's completion and, hence, record unlock will occur within a reasonable amount of time.
争论的另一方认为服务应该永远不要为另一个服务而锁定,因为着涉及岛大量的关于事务完成的信任,因而,记录应该在一个合理的时间内解锁。
Upon closer analysis, this debate is really about the definition of the word service. If two pieces of code share atomic transactions, are they independent services or is their relationship so intimate that they comprise one service. There will always be bodies of code connected by 2-phase commit; however, the question is about whether or not they are in the same service.
更进一步的分析,这个争论是关于service这个词的定义。加入两个代码分享一个原子事务,他们是独立的事务或者他们的关系如此紧密以至于他们组成了一个服务。将总是有两段提交的代码,然而,问题是他们时候在同一个服务内。
This paper explicitly focuses on some of the challenges that arise when two services do not share atomic transactions. Just as there are pieces of code that share atomic transaction through 2-phase commit, other cases do not. This paper will examine some of the implications that arise when the independent pieces of code do not share transactions. Hence, for this paper, the term service carries the connotation of independence, autonomy, and separate transactional scopes.
这篇文章显示的关注那些两个服务不分享一个原子事务的挑战。就像存在两段提交共享事务操作的代码一样,也存在这不共享的情况。这篇文章将家查一些独立的代码不共享事务的应用情况。因而,对于这篇文章,服务这个词的意思是独立,自治,和分离事务的范畴。
Operators and Operands
操作符和操作数
In a service oriented architecture, the interaction across the services represent business functions. These are referred to as operators. The following are some examples of operators:
Please PLACE-ORDER.
Please UPDATE-CUSTOMER-ADDRESS.
Please TRANSFER-PAYMENT.
在SOA中,跨服务的交互代表着业务功能。这就提到了操作府。下面有几个操作符的例子
Please PLACE-ORDER.
Please UPDATE-CUSTOMER-ADDRESS.
Please TRANSFER-PAYMENT.
Operators are part of the contract between two services and are always about the business semantics of the stated service. Operators can also be a form of acknowledgement indicating the acceptance of an operator. Consider the following examples:
Acknowledge receipt of PLACE-ORDER.
Acknowledge completion of TRANSFER-PAYMENT.
操作符是两个服务之间契约的一部分,并且总是服务的业务语义表示。操作符也可以以确认的形式表示一个操作符的接受。如下几个例子
Acknowledge receipt of PLACE-ORDER.
Acknowledge completion of TRANSFER-PAYMENT.
An acknowledgment has business-defined depths. As a result, there is a difference between acknowledging the request receipt to TRANSFER-PAYMENT and acknowledging the completion of the transfer. This all comes down to clearly defining the business semantics in the contract.
一个确认具有业务定义的深度,结果是,在确认TRANSFER-PAYMENT的请求收条和确认传输完成之间存在差异。这归结为契约中清楚的业务逻辑。
Operator messages may also contain many operands. An operand is defined as a piece of information needed to conduct an operation. It must be placed in the message by the sending service. In simplest form, operands are responsible for the parameters in messages. Some examples of operands include the identification of a customer placing an order or the SKU number for a line item within the order.
操作符 消息 还可以包括许多操作数,一个操作数定义为一个操作需要的信息。它必须被发送的服务放在消息中。在最简单的形式中,操作数作为消息中的参数。一些操作数的例子包括消费者放置一个订单或者为一个SKU数字
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig6.gif
Figure 6. Operator messages with operands
Let's consider how and where a service gets the operands it uses to prepare an operator message. Operands come from reference data, which is typically published by the target service for the operator.
让我们考虑一下,一个服务怎样和在哪里获得它用来准备操作符消息的操作数。操作数来自引用(reference)数据,它典型情况下是被目标服务发布。
Reference data is somewhat of a new kind of data in SOA. The word somewhat is used since versions of SOA have been in existence for decades with MQ, EDI, and other messaging systems. Similarly, before it was all computerized, humans were manually completing SOA-style operations. When customers wanted to order products from a department store, they filled out an order form and sent it in by mail. The department store catalog is reference data and the department store order is an SOA operation.
引用数据是SOA中的新型数据。这个词在SOA的不同版本中存在了数十年,包括MQ,EDI,以及其他消息系统中也用到。类似的,在计算机化之前,人们手工完成SOA类型的操作。当消费者要从百货公司定产品的时候,他们填写订单表格,然后通过邮件发送。百货公司的存储目录就是引用数据,并且百货公司的订单,是SOA的操作。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig7.gif
Figure 7. Publication of reference data and use of operands
图7 引用数据的发布和操作数的使用
Outside Data: Sending Messages
外部数据:发送消息
Immutable and/or Versioned Data
不可变的/加入版本号的数据
Data exists in many forms. One type of data is immutable data. Essentially, immutable data is unchangeable once it is written. You can find immutable data almost anywhere in the real world. The following are a few examples:
The New York Times edition of June 3rd, 1975 is unchangeable
The first edition of a published book is unchangeable
The words spoken by the United States president on television are unchangeable
数据以很多种形式存在。一种类型的数据就是不可变数据。本质上,不可变数据是一旦写入值,她就不能再改变。你可以再显示世界中任何地方发现不可变数据。下面就是一些例子。
纽约时代周刊的****
书籍第一版出版日期是不变的
美国总统再电视上说的话,是不能改变的
All immutable data have identifiers. An identifier ensures the same data is yielded each time it is requested no matter when it is requested or where it is requested. Therefore, if the same identifier is used then the same bit pattern is retrieved. For example, a person will get the same pricelist today if it is retrieved from Joe's Internet Bazaar for Thursday, July 3, 2003.
所有的不能改变的数据都有标示。便是使每次请求得到相同的数据,不管这个请求是什么时候,来自哪里。因此, 如果使用同样的标识符,那么将得到相同的比特模式(then the same bit pattern is retrieved)【译者:意思应该是获得相同的data】.例如,每个人将获得同样的价格列表,如果它查询Joe's互联网商店2003年7月3号价目的话。
Another type of data is versioned data. Versioning is a technique for creating immutable sets with unique identifiers. Through the availability of versioning, a person can ask about the latest service pack (SP) for Windows NT4, or a recent edition of the New York Times. Versioning also has different types of identifiers: version-dependent identifiers and version-independent identifiers.
另一种类型的数据是版本数据。版本话是一个使用唯一标识来产生不可变集合的技术。通过可以获得的版本,人们可以知道得到最新的winnt的补丁,或者时代周刊的最新版。版本化还有统统的标识符:版本依赖的标识符,版本独立的标识符。
Version-dependent identifiers include the desired version as part of the identifier. This identifier always retrieves the same immutable bit pattern. In contrast, version-independent identifiers do not include the desired version in the identifier. As a result, the process for resolving version-independent identifiers to its underlying data involves two steps:
Map from the version-independent identifier to the version-dependent identifier
Retrieve the immutable bit pattern of the version
版本依赖的标识符包括想要的版本作为标识符的一部分,这样的标识符总是获得相同的不可改变的比特模式【译者:指数据】。相反,版本独立的标识符中不包括想要的版本。结果是,助理版本独立的标识符标识的数据包含两步:
版本独立的标识符 到 版本依赖的标识符的映射
查询该版本的bit模式
The following is an example of the above steps from a real world perspective. A person visits the newsstand to buy a recent edition of the New York Times. This behavior involves deciding if today's paper or yesterday's paper is needed. Given the version-dependent identifier, today's paper, the person buys today's paper dated July 20, 2004. Ultimately, with version-independent identifiers the answer given will not be the same for each request. For example, if the exact event happens the following day, the person will receive a newspaper dated July 21, 2004 and not July 20, 2004.
下面是上面步骤在真实世界中的一个例子。一个人来到报摊,要买一份最近的纽约时代杂志。这个行为包括决定需要今天的还是昨天的报纸。假如是版本依赖的标识符,今天的报纸,这个人买今天的报纸,2004年7月20日的。最后,如果是版本独立的标识符,那么答案将根据请求的不同而不同。例如,接下来一天要发生一件精确事情【译者:指安排好的肯定要发生的事情】,这个人想要04年7月21号的报纸,而不是7月20号的报纸。
Immutability of Messages
消息的不变性
Every message traveling through a network maybe retransmitted in the event the message is lost. Every message sent is guaranteed to be delivered zero or more times. Considerations are based on:
Networks losing messages.
Networks retrying messages.
Those pesky retries actually being delivered.
每个穿越网络的消息在消息丢失的时候都可能要重新发送。每个消息发送保证了要发送零次或者更多次。考虑下面的情况
网络丢失消息
网络重试消息(Networks retrying messages.)
Those pesky retries actually being delivered.
It is important for retransmitted messages to remain unaltered no matter how many times they are sent or else a great deal of confusion and unhappiness would ensue. Therefore, all messages should be immutable.
非常重要的是,对于重新发送的消息要保持没有改变,不管他们被发送多少次,否则,大量的困惑和不开心的事情将随之而来。因此,所有的消息都要不可变【译者:即使是重发的消息】
Additional consideration needs to be given to messages traveling through the network. In the absence of a reliable messaging framework using serial number and retries, the end application may see duplicate messages. Additionally, careful thought must be given to the life of the messaging framework and the life of the endpoint. Consider a case when long-running work may take weeks or months to complete. TCP/IP cannot be counted on to eliminate duplicates in this situation. If the system crashes and is restarted, a different TCP connection is obtained and may result in duplicate messages being sent. Because confusion may arise from messages being resent, it is advantageous to have immutable messages so the same bits are always returned.
应该给予传送于网络的消息另外的关注。缺少可靠的使用序列号和重试机制的消息传送框架,中断应用可能收到重复的消息。另外,一定要仔细的考虑消息传送框架的声明周期和终端的声明周期。考虑一个情况,长时间的工作井跨越数据甚至数月来完成。这种情况下,TCP/IP不能被用作消除重复数据。假如系统崩溃,重新启动,获得一个新的TCP连接,并且导致发送重复的消息。由于消息重复发送将导致混乱,使用不可变的消息的优点就是每次得到的都是同样的bit流。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig8.gif
Figure 8. Immutability of messages
图8 消息不可变性
To Cache or Not To Cache
缓存还是不缓存
Most conversations on caching usually end with a warning against the practice. This is not one of those cases. Caching immutable data is acceptable and, indeed, recommended because each time the data is requested the same answer is guaranteed. There is no possibility of error in this situation. As a result, the cache never has to be shot down. Caching data that is not immutable is risky because it can lead to anomalies.
每个关于缓存的讨论都将给出一个针对于实践的警告作为结尾。这不是其中的一个事情(This is not one of those cases.)。缓存不可变数据是可接受的,并且实际上是推荐的,因为每次请求的数据,结果是一样的。这种情况下没有没有出错的可能性。结果是,缓存不应该被禁用。缓存不是不可变的数据是危险的,因为这可能导致异常。
It is also acceptable to cache versioned data since each version is immutable. There is never confusion over what data is returned from a cache involving a version-dependent identifier. The version-dependent identifier yields the same bits each time.
既然每个版本的数据是不可变的,那么混存版本化数据也是可以接受的。从一个包括版本依赖标识符的缓存中返回数据不会令人困惑。版本依赖的标识符每次总是获得相同的数据。
Note It is not recommended that anything be cached if it is referred by a version-independent identifier. The results yielded in this situation are unpredictable.
注意 不推荐缓存指向任何版本独立的标识符。如果非要这样的话,结果将不可预测。
Normalization and Immutable Data
规格化和不可变数据
【译者:到底什么使规格化normalized 和非规格化de-normalized 啊】
Normalization is an essential design consideration for database schemas. Because normalization ensures each data item lives in one place, it is easier to ensure updates do not cause anomalies. This is illustrated in a classic example involving an employee-manager database. The manager's phone number is commonly listed in each row for each employee in the database. A problem is encountered when trying to update the manager's phone number because it lives in numerous places.
规格化对于数据库格式设计来说是一个重要的设计考虑。因为规格化确保每个数据项只驻留在一个地方,很容易确认更新而不导致异常。这一点在一个经典的关于雇员管理的例子中得到验证。经理的电话号码经常作为每个雇员数据库中一行的一个数据。当要更行经理的电话号码的时候问题出现了,因为它驻留在无数个地方。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig9.gif
Figure 9. Problems with de-normalization
图9 非规格化的问题
If the data is immutable, it may be practical to allow it to be de-normalized. For instance, it is acceptable for email messages to be de-normalized since the messages cannot change once they are sent out of a service. Likewise, if a message is sent between services and will be interpreted by the business logic of the services, it may be challenging to perform joins. Because of this, immutable messages frequently contain de-normalized information.
假如数据是不可变的,那么允许它不规格化是很实际的。例如,可以令电子邮件消息使不规格化,因为消息在发送后,不能改变。同样地,加入消息将被消息发送而且会被消息的业务逻辑解析,那么它可能对于执行连接提出挑战。因此,不可变消息一般都包含非规格化信息。
Immutable Schema
不可变格式
Messages can be sent and resent, but at the end of the day if both the sender and the receiver do not understand the messages then nothing has been accomplished. As a result, every message needs a common schema. The schema used is typically referred to as meta-data. In the event the meta-data is ever changed, confusion will result. Accordingly, the schema used must always be known or should be able to be located if the message is going to be processed.
消息能够发送和重送,但是如果最终发送这和接收者都不理解消息,那么就等于什么都没有作。结果是,每个消息都需要一个普通的schema。这个schema作为原数据。一旦原数据改变,问题就会出现。相应的,如果消息要被处理的化,schema必须可以被之活或者可以被定位在什么地方。
Stability of Data
数据的稳定性
Ensuring the immutability of data across service boundaries eliminates many problems, but it does not ensure the message is understood. For example, a reference to President Bush made in 2003 means something different than a reference to President Bush made in 1990. People may fail to notice the reference is to two different people and therefore, misunderstand the message.
确保数据跨服务界线的不可变性消除了很多问题,但是 它不能保证消息是可理解的。例如,a reference to President Bush made in 2003 means something different than a reference to President Bush made in 1990.人们可能没有注意到这是两个不同的人,因此可能错误的理解这个消息。
The notion of stable data is introduced as having an unambiguous interpretation across space and time. This leads to the creation of values that do not change. For example, most enterprises never recycle customer identifications or SKU numbers. It is problematic to ensure the old interpretation no longer exists so these values are permanently assigned. Consider a banking situation. If a single piece of customer information comes out of the bank's archive years later and the bank tries to look up the customer's identification, it had better not refer to some other customer. By not reusing a customer's ID, it remains stable.
稳定消息的概念在具有跨时间空间的解释歧义的时候引入进来。这导致不改变值的创建。例如,大多数企业从来不会反复利用用户的证明或者SKU号码。如果确信来得解释不再存在,因为这些值应该永久的分派将出现问题。考虑银行的情况,假如若干年后一个银行用户的信息,银行要查找这个用户的证明,最好不要指到其他用户上去。通过不重新利用用户的ID,保持了稳定性。【译者:这里就是说银行的每个帐户都有一个唯一的用户ID,这样好查找】
Note It is also worth mentioning that anything that is current is also not stable. A reference to the current activity on a credit card is not stable because it does not clearly communicate when the snapshot of the activity was accurate.
注意 下面的事情也值得提及一下,就是当卡你的任何事情都是不是稳定的。一个指向信用卡当前活动的引用不是稳定的,因为在活动的快照是精确的时候它没有清楚的通信。
In summary, data that is distributed must be both immutable and stable. Versioning is an excellent technique to create immutable data. Finally, the data must refer to immutable schema. The combination of these, results in interpretations of the message that is invariant across space and time.
总之,分布式的数据就必须是不可变的,稳定的。版本化是一项非常好的产生不可变数据的技术。最后,数据必须指向一个不可变的schama。这些的合并,将得到跨时间空间的不可变消息的正确解析。
A Few Thoughts on Stable Data
关于稳定数据的几点思考
Everything sent outside must be stable data so the interpretation of each message continues to be consistent across valid places and times to ensure the information is understood. Even data inside is sometimes stable. Cases like these occur when the data sent outside is also kept on the inside. Take for example a shopping basket and the product SKUs inside the basket. SKUs sit in a shopping basket service until they are sent to the order fulfillment service. When the SKUs are in the shopping basket service, they must be stable because they are living across multiple interactions with the inside.
每个被发送的数据都必须是稳定的,一次每个数据的解释都是一致的,即使跨时间空间,来确保每个信息都是可理解的。甚是数据内部又是也是稳定的。当数据发送到外面,他也在内部保存的情况也经常有。举个例子,一个购物筐和在购物筐中的产品SKU码。SKU码一致保存在购物筐中,直到他们被发送到订单完成服务。当SKU码在购物筐服务中的时候,他们必须是稳定的,阴文他们存在于多个交互的情景中。
Validity of Data in Bounded Space and Time
特定时间空间内数据的有效性
By bounding data in space and time, it is known where and when the data is valid. Placing an expiration date on data such as, "Offer is good until next Tuesday" is one example of the validity of the data in bounded time. There may also be information on where data is valid. Examples follow:
通过限定数据的时间和空间,也就是数据在哪里在什么时候是有效的。给数据加上一个有效日期。例如,这个价格一直到下周四都是有效的是一个数据时间有效性的例子还有一些数据在哪里的信息有效性。下面有几个例子:
The offer is only good to Washington State residents
Data is valid only on these two servers
The information is valid only for the Acme & Sons Company Accounting application
这个价格只针对于华盛顿州的居民
数据只有在两台服务器上是有效的
数据只针对与the Acme & Sons Company的帐务应用系统有效
It is imperative that all valid data also be immutable and stable. Moreover, if valid data is retrieved then the same data should be produced and its interpretation should be unchanged.
所有有效的数据也必须是不可改变的和稳定的。再者,如果查找有效数据那么应该产生同样的数据,并且它的解释不应该改变。
Before deciding data is valid everywhere and at all times, consider if ranges in validity are beneficial. If they are then document the constraints in the message. Ultimately, it is wise to define the validity of any data sent outside of a service.
在决定数据在任何时间空间都是有效的之前,考虑是否这种有效性的定义是有理的。如果他们是在消息中的数据。最后,定义任何发送到服务之外的数据的有效性是明智的。
Rules for Sending Data in Messages
发送消息的rule
The following table offers some distilled rules for sending messages outside of service boundaries:
下表给出了一个精心准备的关于发送消息的rule
Table 1. Rules for sending data
Identify the messages
识别消息
Always put a unique ID in every message
总是在每个消息中增加一个唯一的ID
Part of the unique ID may be a version
这个唯一的ID的一部分可以是版本信息
Immutable Data
不可改变消息
The data in a message must be immutable
消息中的数据必须是不可改变的
Never change the contents of a message on retransmission
永远不要改变重新发送消息的内容
Always return the same bits
总是返回相同的数据
Ok to Cache
缓存
Since the ID of the message returns the same data, it is Ok to cache a message
既然通过消息ID获得相同的数据,那么缓存消息是OK的
The cache will never cause incorrectness
缓存永远不会造成错误
Define Valid Ranges
定义有效的边界
Define the valid ranges of space and time
定义有效的时间和空间边界
Ok to always be valid
总是定义有效性
Must be Stable
一定要稳定
Ensure the meaning of the message is unambiguous within the valid space and time
确保在有效的时间和空间内,消息是无歧义的
Outside Data: Reference Data
外部数据:引用数据
What Is Reference Data?
什么是引用数据
Reference data is information published across service boundaries. For each collection of reference data there is one service that creates it, publishes it, and periodically sends it to other services. There are three broad uses for reference data: operands, historic artifacts, and shared collections of data. Sometimes, the distinction between their uses is not rigid and may overlap.
应用数据是跨越消息界限发布的信息。对于每个引用数据的集合,总是有一个服务创建它,发布它,周期性的发送非其他服务。对于引用数据有三个主要的应用:操作数,历史数据,和共享的数据集合。又是他们之间的区分是不明显的,可以重叠的。
Operands and Operators
操作数和操作符
Operands add crucial information such as parameters or options to create the operator requests sent out for processing. The following are examples of operands:
操作数添加至关重要的信息,譬如参数,或者发送给处理过程的创建操作符的选项。下面是操作数的例子。
The customer-ID for the customer placing the order
The part numbers for the parts being ordered
The expected shipment date and the price agreed to for the order
设定订单用户的用户ID
被order的各个部分的每个部分的编号
期望的装货日期和商定好的价格
The data for operands is published as reference data. Reference data is typically sent out on different schedules as required. Consider the following:
操作数的数据作为引用数据发布。引用数据要求按照不同的时间表发送,看看下面的例子。
The customer database is sent daily as a snapshot
The parts catalog is updated weekly
The price-list is updated daily
用户数据库每天发送快照
零件数据库每周更新
价格列表每天更新
Versioned reference data is published by the authoritative service so its partners have the timely operands needed to ensure information accuracy. It is essential that the operator requests are processed understanding that the operands are derived from versioned reference data. This is just like specifying that an order to a department store catalog is based on the Fall and Winter edition of the catalog.
版本化的引用数据被 authoritative 服务发布,因此它的partner们拥有即使的操作数,来保证信息的准确性。操作请求处理要理解操作数是从版本化引用数据得来的,这一点至关重要。这就象指定一个发给商店目录的请求是基于秋天目录还是冬天目录。
Historic Artifacts
历史数据
Historic artifacts are another type of reference data. Its purpose is documenting past information within the transmitting service. Related services receive and use historic artifacts to perform other business operations. Examples of historic data include:
历史数据是另一种类型的引用数据。它的目的是 记录服务创送过的历史数据。相关的服务接受来说十角来执行其他的业务操作,历史数据的例子包括:
Quarterly sales results
Monthly bank statements
Monthly bills
每个季度的销售结果
每个月的银行 statements
每个月的帐单
The reference to monthly bills needs further discussion. Not only do monthly bills include historic artifacts about activity during the past month, but monthly bills also request customers make a payment. This request defines a business function. Therefore, monthly bills also behave as operators.
对于每个月的帐单的引用还需要进一步的讨论。每月帐单不仅包括上一个月活动的历史数据,而且每月帐单要求客户做一次结帐。这就要求定义一个业务功能。因此每个月的结帐被作为一个操作。
The use of historic artifacts raises many privacy concerns. However, in almost all cases, historic artifacts are only shared between closely related services that are trusted, or the receiving partner is sent specific pieces of the service's data appropriate for the partner's viewing. An example of tightly related and trusted services involves quarterly sales results. These results are published by the sales supporting services and sent to the accounting department's services. Inventory status rollup is then passed to accounting. Alternatively, a bank statement or a phone bill sent to a customer illustrates historic artifacts that are transmitted to a single partner.
历史数据的使用导致了很多隐私关注的出现。不管怎样,在大多数情况下,历史数据只是被相互信任的服务共享,或者the receiving partner is sent specific pieces of the service's data appropriate for the partner's viewing.一个紧密相关相互信任的服务的例子包括季度的销售结果。这些结果被销售支持服务发布,发送给帐务部门服务。目录状态也传送给帐务。可以替换的是,一个银行statement或者电话帐单发送给用户证明了历史数据发送给单个用户的情况。
http://msdn.microsoft.com/library/en-us/dnbda/html/dataoutsideinside_fig10.gif
Figure 10. Publication of Historic Artifacts
图 历史数据的发布
Shared Collections
共享集合
Reference data sometimes shares the same collection of data across an enterprise or different enterprises. Even after this collection of data is accessed across an enterprise, it continues to evolve and change. Typically, there is one special service that owns this information and is responsible for updating and distributing new versions of the data across systems. Examples of shared collections of data include:
引用数据有时共享一个企业或者不同企业的数据集合。即使这个数据集合跨企业被访问,它也持续的发展变化。典型的是,有一个特殊的服务,它拥有信息还负责升级和发布新版本的信息。共享集合的数据例子包括:
Customer database - It contains all the relevant information about the customers of the enterprise.
Employee database - It contains information about every employee in the enterprise.
Parts database and pricelist - It contains descriptions of the parts, SKUs, and their characteristics. Also included are the prices for the various SKUs and the discount policies for customers.
客户数据库 - 它包括所有企业用户数据的信息
雇员数据库 - 它包括企业内部所有雇员的信息
零件数据库和价格列表 - 它包括零件描述,SKU,以及他们的特征。还包括不同SKU的价格以及对用户的折扣策略。
It is worth noting that the version distributed across an enterprise is never the latest version of the data. It is generally impossible to have universal knowledge of the latest information in a loosely coupled distributed system. As a result, those interested in the data must settle for a recent version because only the authoritative service knows the latest state of the information. Even if the authoritative service tries to meet a request for a more recent view of the information, the data can change by the time the partner system receives it. By the time the partner service sees it, the authoritative service cannot guarantee that it is the most recent copy.
没什么,跨越企业发布的版本永远不是数据的最新版本(It is worth noting that the version distributed across an enterprise is never the latest version of the data.)。在一个分布式松散耦合的系统中,基本不可能获得数据最新的信息。结果是,those interested in the data must settle for a recent version ,因为只有authoritative服务知道最新的信息状态。尽管authoritative服务尽量的满足对最新信息的请求,但是partnet接受数据的时候,可能这个数据就已经改变了。partner看到的时候,authoritative服务不能保证这就是最新的拷贝。
Although shared collections of data have proven their place in the world, they represent a huge challenge for most enterprises. Many different applications have their own opinions of the correct value of the data for a customer. For instance, differences about what constitutes a customer exist. There is also a difference of opinion on the data needed to describe a customer. Lastly, the representations of certain data items describing a customer are not semantically aligned across different applications. However, there is currently a trend to consolidate these disparate opinions. This is, unfortunately, much like herding cats!
尽管共享集合数据已经证明了他们的作用,但是他们对大多数企业提出了巨大的挑战。许多应用关于数据正确性具有他们自己的观点。例如,什么确定用户的存在的区别。还有关于需要什么样的数据描述用户的区别。最后,描述用户的特定数据项的表示在各个不同应用之间没有取得一致。然而,现在有加强这个不同观点的趋势。这很不幸,就好像herding cats。
The general course of action is to create an authoritative source to manage the state of each shared collection of data and distribute a recent version to those requesting it. Disparate applications, for instance, are evolving to receive descriptions of the shared data from the customer manager service. In an attempt to align this data, the schema for each customer includes enterprise wide standard fields and includes extensions used by special applications. However, nothing can align perfectly across an enterprise. Also, some of the interested parties have their own extensions, which should be managed by the authoritative source. This presents yet another challenge.
通常的做法是创建一个authoritative源来管理共享数据的状态,并且发布这些数据的最新的版本给那些请求的partner。例如,不同的应用要接受从客户管理服务的共享数据。为了统一这些数据,每个用户的schema包括企业广泛标准字段,包括特殊应用的扩展。然而,没什么能做到跨企业完全统一。而且,一些感兴趣的组织有他们自己的扩展,它应该可以通过authoritative资源来管理。这也是另一个挑战。
Requesting Changes to Shared Data
请求改变共享数据
Sometimes a service other than the authoritative service wants to see changes made to the contents of the shared collection of data. Because this service cannot directly update the contents itself, a request is sent to the owner of the data. This allows the authoritative service to manage the data. These requests must be business operations that are represented as service operators across services. Only the supported business functionality that is desired to export from the authoritative service will be implemented as operators. If the authoritative owner of the data agrees to make a change, the change is subject to the business logic enforcement of the authoritative owner of the shared collection. There are also situations when a new version of the entire shared collection of data is transmitted to the partner services instead of a few changes. Sending a copy of the entire customer database to the partner systems is one example.
有时,除了authoritative服务,其他服务想要知道共享数据集合的变化。因为这个服务不能直接更新数据内容,所以要发送请求到数据拥有者。这是authoritative服务区管理数据。这些请求必须是业务操作,通过服务操作符来表示。只有想要从authoritative服务发布出来的被支持的业务功能才能被用作操作符。假如数据的authoritatice拥有者同意进行改变,
Note: No matter how minor or major, any changes made must be considered carefully since changes to shared collections of data have business side effects that must be implemented by the authoritative service.
注意:不管是多大的改动,必须谨慎考虑,因为对共享数据集的改变会对业务产生影响。所以必须通过authoritative服务来进行。
What about Optimistic Concurrency Control?
关于最优化并发控制
Optimistic concurrent control refers to when a reader makes changes to data and then proposes these changes to the authoritative service. The reader sends the original view of the data back to the authoritative service. If the original data is still intact, the authoritative service makes the proposed changes.
最优化并发控制是指当一个reader对数据进行改动的时候,然后提交改动非authoritative服务。这个reader给authotitative服务原始的数据回去。加入这个原始的数据仍然是没被人改动过的,那么authoritative服务执行被提交的改变。
Sometimes, it is proposed that services use optimistic concurrency control across boundaries. This makes several assumptions:
有时,提出服务采用跨服务边界的最优化并发控制。重要作出如下设想:
The outside service is allowed to read the data. However, privacy issues and encapsulation issues may make this situation problematic.
The owner of the data trusts the business logic of the outside service. After all, some logic must have executed on the outside to create the image of the data that is being proposed for write.
Updates across service boundaries have little to do with optimistic concurrency control. Instead, they have a lot to do with the logic surrounding it. It is all about trust and who can make changes to the data. Services by nature are distrusting so they will not allow foreign business logic to create data to be stored in the local database. Only the local business logic can create data to be stored in the local database. Therefore, it is important for partner services to be aware that they cannot just change the data sent to them, but should ask for the change to be completed by the authoritative owner of the data.
Example: Updating a Customer's Address
Many times, people offer a counter example to the discussion above wherein they propose that the management of a customer record should be done using optimistic concurrency control. Let's consider why this is problematic across service boundaries.
Figure 11. Versions of a shared collection and a request for update
One may initially believe that it is appropriate to allow the salesperson to update some fields of the customer record directly and submit these to the authoritative service as a "write" assuming there is no optimistic concurrency collision. This is problematic for a couple of related reasons. First, the authoritative owner of the data does not want to yield control of the ability to change the data to some other service's business logic. It wants to be responsible for the integrity of the data and, hence, want its own business logic to be responsible for the change. Second, the change to the field may have business implications that need enforcement. For example, changing the address may result in tax implications for the customer, changing the responsible salesperson, and ensuring that any in-flight shipments are redirected. Therefore, it is not enough just to change the field in the customer record. In summary, it is essential that interactions with the authoritative service be oriented around business functions such as "please update customer address."
Publishing Versioned Reference Data
To ensure reference data is not mixed up, the publisher first defines a name for the data. Version numbers are then added to each data update. Examples of versioned reference data involving operands, historic artifacts and shared collections follow:
Operands: A price list dated Monday, July 19, 2004
Historic artifacts: A request is made for Joe's February bank statement from two years ago
Shared collections: A customer database dated Thursday with a 10am timestamp
Updated information is always versioned to distinguish between copies of data. When the information is ready to be sent, many transmission techniques are available like messaging or file copy. Currently, FTP is the most common transmission technique in use. Ultimately, it is not very important what technique is used to transmit information. What is important is the way operands, historic artifacts, and shared collections work with cross-service computing.
Data on the Inside
Until now, discussion has focused on data on the outside ?the transmission of messages across networks, the importance of immutable and stable data, and data publication across services. The next section focuses on data on the inside.
Messages Are Data
All services receive messages. These messages contain operators asking the service to perform a function. The function may be a business instruction or perhaps a product order. The function may also be to accept some incoming reference data.
Once the services receive the messages, they record and commit them as data in a Structured Query Language (SQL) database. This ensures the data is stored and retrievable. Next, a transaction takes place that marks the incoming messages as consumed. As a part of the transaction, outgoing messages are queued in the local database for later transmission. Finally, the whole transaction is atomically committed. If the transaction terminates before the entire operation is committed, the incoming message will reappear in the queue and the transaction is retried. With the possibility a transaction may abort, outgoing messages are never sent until the transaction is processed to ensure message transmission is atomic with the rest of the transaction. If for instance, a message is sent and then the transaction is aborted, the message may still be processed even when the transaction failed. The atomicity of the transaction is violated in this situation because the transaction has not been completely undone.
In addition to the transactional support achieved by storing the incoming messages in the SQL database, there are business benefits. The contents of the messages can be easily retrieved for different purposes such as audits and business intelligence analysis. As an added benefit, data in a SQL database allows for management and monitoring of the ongoing work to be based on SQL queries.
Kinds of Data Inside Services
Inside of services, three classes of data are found:
Table 2. Data inside services
Reference Data from Other Services
This reference data is transmitted by one service to another service that reads and stores the data.
This includes three types of reference data: operands, historic artifacts, and shared collections.
Periodically, new versions of the reference data are received and stored internally along with the appropriate version identifier.
Messages (Operators and Responses)
Messages refer to both incoming and outgoing operators (requests) and their responses.
Service-Private Data
Service-private data is internally maintained data that is never directly exposed outside of the service.
Note: This data is frequently exposed indirectly through business logic.
The following sections examine the characteristics and uses of the three classes of data mentioned in the above table.
Service-Private Data
Service-private data resides inside a service and is protected by the business logic of that service. Its contents are not readily known or available to anyone besides the service in which it lives. The only way to read service-private data from outside the service is by calls through the business logic, which controls what data is exposed.
Typically, this data is heavily processed to yield a particular result ?highly specific information. Consider a bank's ATM. Customers are not aware of the bank's backend data from their account management system. Customers may only see their account balance or their last 10 banking transactions. This information has all been heavily processed by the bank's business logic. The only way to change the service-private data from the outside is by submitting a business operation to the service. For instance, altering bank data may occur when a customer performs a transaction such as a withdrawal. This transaction changes the bank's backend service-private data indirectly through the business operation of the withdrawal.
Replication of Reference Data
A single publisher commonly sends reference data to many subscribers. Since reference data is both immutable and versioned, it is easy to replicate across different services. It is also easy to keep many copies of the data since no semantic ambiguity exists.
Reference data may also be imported into a service. As it is read into a service, it may be processed, reformatted, and indexed. Data stored in the service remains immutable. While the syntax and internal representation of the reference data may be changed to suit the needs of the service, the semantics remain intact and are considered a representation of the same immutable data.
Note Reference data stored inside the subscribing services are replicas. However, there is no real distinction between replicas and copies of data because they are all immutable.
Figure 12. Transaction use of data on the inside
Data: Then and Now
Significant differences exist between the temporal interpretation of data inside services and outside services. This section examines the different perspectives and uses of time in a service oriented architecture.
Life in the Now: Transactions and Inside Data
Transactional systems have worked hard to provide the application developer with a sense they are alone manipulating the database. Serializability refers to ensuring that for all transactions, any other transaction that interacts with the database appears to occur before or after the transaction in question.
It is reasonable to consider that transactions, as they are executing, live in the "now". Due to serializability, there is no possibility that anything else is happening concurrently from the application's perspective. Although subject to security, the application is able to examine and modify almost anything in the database viewing the most up-to-date information possible. This leads to a perspective that the application lives firmly in the "now".
Inside a service, the business logic of the service is dealing with only the latest-and-greatest view of the service's private data. In this fashion, it is said that the service logic is deeply tied to the "now" of the service and its service-private data.
Life in the World of Then: Data on the Outside
When operators are sent in messages, they are requests for business functions to be performed. In effect, the sender is hoping the service will perform the business function in the not-too-distant future. As the transaction commits on the sending node, it is perfectly clear the requested operation has not yet occurred, but will happen in the near future.
Consider the various kinds of reference data sent between services and the temporal semantics of the data. As discussed above, reference data falls into three broad categories: operands, historic artifacts, and shared collections.
Operands describe the data used to form an operator request for a business function. Just like a department store catalog, operands are typically valid for a specified amount of time. Outside of that time, the operands are likely to be invalid if used in the creation of an operator. While this range of time is likely to include the present, it clearly is not about the immediacy of the "now". It is a supported range of "then".
Historic artifacts describe what went on inside the publishing service in the past. There is no information about the use of historic artifacts dealing with the present or the future; it is only about the past.
Shared collections of data are also about the past. For example, the authoritative service may publish a view of the state of the enterprise's customers on a daily basis. Partner services do their work based on last night's view of the state of the customers. Again, the perspective is one dealing with "then".
Services: Dealing with Now and Then
One of the biggest challenges in the transition to Service Oriented Architectures is getting programmers to understand they have no choice but to understand both the "then" of data that has arrived from partner services, via the outside, and the "now" inside of the service itself.
The semantics of the business functions provided by the service must alleviate this tension. By abstracting and loosening the behavior supported by services in their messaging contracts for business functionality, it is possible to cope with the difference between "then" and "now". This is exactly what has been seen for generations in cross-business work. For example, the department store catalog includes many different products and guarantees the price as shown in the catalog for more than six months. Business logic and the intelligent design of the service contracts can cope with the impedance between "then" and "now".
Representations of Data: Inside and Outside
The remainder of the paper focuses on the representations of data on the inside and outside. It also analyzes and compares three models, Extensible Markup Language (XML), Structured Query Language (SQL), and objects. The information presented ultimately shows seminal differences between data inside a service and data sent outside the service. Also revealed is that the strength of each model in one use becomes its weakness in another use.
Inside and Outside Data
Data can be broken down into two main classes, data on the inside and data on the outside. Data on the outside is sent across service boundaries in the form of messages and include business-function and reference data, which must be understood by both the sender and the receiver. The data is always immutable and can be versioned. Since it travels between services, versions are likely stored as replicas by the receiving services. Conversely, data on the inside lives in the service and it is rarely sent out of the service. If it is transmitted outside, it must be processed by the business logic. This class of data is by nature private to the service and encapsulated by code. Below is a table providing additional information on both classes of data:
Table 3. Data inside vs. data outside
Immutable
NormalizationIs Interesting
Stable
OutsideData
Yes
No:Immutable
Yes
InsideData
No
Yes
Not Necessarily
The table identifies data on the outside as always immutable. This makes the notion of normalization unimportant. Yet, immutability is not enough because data on the outside must also be stable so its meaning is clearly understood across different times and locations. Versioning and time stamping are recommended to make data stable. Although stability is not a concern for data on the inside, it can be stable. For instance, the data can include stable items if it refers to data being sent to the outside. Since data on the inside is not necessarily immutable, it should be normalized to prevent update anomalies.
Bounded and Unbounded Data Representations
Let's consider the ways in which SQL and XML represent their data and the implications on the scope of that representation.
All data stored inside SQL databases live within the bounds of the database. Therefore, value based comparisons are only meaningful if the domain of the values is the same. Interpretation of a SQL database outside of the database itself is impossible.
The scope of the transaction used to modify the database defines the temporal bound for relating values contained inside the database. For instance, if the transaction is committed and a copy of the data is sent outside the service, the underlying values may change before the copy is used. The interpretation of the values after they are unlocked is subject to interesting semantic anomalies and is usually avoided.
Lastly, relational data also has a tightly managed schema. The schema is modifiable through Data Definition Language (DDL) within a transaction. DDL is usually tightly associated with the SQL database. Indeed, as soon as one transaction commits, the next may change the DDL that made the previous transaction meaningful! This extremely flexible behavior is correctly defined only within the bounds of the database.
Unlike SQL, XML documents and their messages are unbounded because of its open schema. XML's schema definition allows independent creation of the schema, which lets people design their own customized markup languages, or borrow portions of other schema. Moreover, each schema is identified with a Universal Resource Identifier (URI), indicating an immutable version of the meta-data. As a message is sent, the URI for its meta-data unambiguously specifies the schema for interpreting the message. This meta-data remains invariant across space and time.
Another matter that sets XML and SQL apart is that XML uses references and not values to connect information. Connections between sub-trees and across documents are done through references, which are implemented as URIs. When implemented as URIs, references are an unambiguous mechanism for connecting data together that remains intact in the face of schema changes, location changes, and temporal changes.
Because XML information lives in the "then" and is always referring to the past or the future, it can be unambiguously interpreted anytime. The notion of "now" is difficult across systems that do not share atomic transactions. This ability to specify "then" in the semantics of the data makes it interpretable both anytime and anywhere. SQL, living in the intimacy of the "now" within the SQL database cannot be accurately understood except in the "now" and inside the bounds of the SQL database or its surrogates.
Characteristics of Inside and Outside Data
The next section takes a more in depth look at data on the inside and data on the outside and further differentiates between the two by comparing message data, reference data, and service-private data against numerous characteristics.
Table 4. Characteristics of data on the inside and outside
Outside Data
Inside Data
Message Data
Reference Data
Service Private Data
Immutable?
Yes
Yes
No
Requires Open Schema for Interoperability?
Yes
Yes
No
Represent in XML?
Yes
Yes
No
Encapsulation Useful?
No
No
Yes
Encapsulated Access
No
No
Maybe
Long-Lived Evolving Data with Evolving Schema?
No
No: Immutable Versions
Yes
Business Intelligence?
Yes
Yes
Yes
Store in SQL?
Yes: Copy of XML Stored in SQL
Yes: Copy of XML Stored in SQL
Yes
Message data and reference data, both data on the outside, are similar in that they are both immutable with an open schema, and best represented in XML. Service-private data, on-the-other-hand, promotes encapsulation which is the opposite of an interoperable open schema. Even schema changes in different ways with data on the inside and data on the outside. For instance, when the message schema is changed, explicit versioning of the schema is used. The evolution of data on the inside may occur vibrantly as DDL changes the current state of the schema. Lastly, one reason both incoming and outgoing messages are shredded into SQL is because business intelligence is an important part of all data regardless of their location, inside or outside. Usually, the amount of XML shredded for queries is based on the amount someone wants analyzed. The extensibility of XML means that shredding is sometimes easy and sometimes more challenging as data may arrive that does not map to the schema.
Once the different kinds of data are identified and their functions understood, most people start to wonder what technologies to use. The following section examines three models, XML, SQL, and objects, and their representation of data.
What about Objects?
Most people know about the power of object-orientation to facilitate software engineering. It, too, has a place in the battle for the representation of data. People have seen object persistence, and object-oriented databases become popular and, sometimes, not live up to expectations. Still, there are important forces that encourage the use of object-orientation as a means of representing data.
The biggest advantage objects have over SQL and XML is that they provide encapsulation. Data being stored are hidden from the user of the data and only the behavior of the methods provided by the object is visible. This is very similar to the way services expose business functionality through the messages defined by their schema and their contracts. The seminal difference between objects and services is that services never share data except as reference data. This is a much looser relationship that exists across objects.
Still, the use of objects to implement services is highly recommended. It is difficult to believe that anyone would want to implement a service without the benefits of an object-oriented environment! If a service is implemented using objects, it is opaque to all of its partner services.
The ruling triumvirate: XML, SQL, and objects
It is interesting to know that each model's strength is simultaneously its weakness. For instance, it is the independence of schema definition coupled with a reference-based hierarchical representation, and the temporal interpretation of the data, which make XML wonderful for communicating across services. These immutable messages are easily created and interpreted across different services. These features derive directly from the unbounded nature of XML. Still, these features of XML for the "outside" are debilitating weaknesses on the inside. It is problematic to query, shape, and update XML with the richness available to normalized data, representing the "now", and stored in SQL. These weaknesses are also directly from the unbounded nature of XML.
Because of its characteristics, SQL makes a great tool for representing data on the inside of services. These strengths are inextricably linked to the bounded nature in both space and time of SQL, which make it fantastic for representing data on the "inside". Unlike XML, SQL has strong querying capabilities. SQL's makes comparisons between almost anything within the bounds of the database. Because of SQL's bounded nature, however, it is incapable of the strengths of XML in the "outside". SQL does not offer independent definition of schema as it depends heavily on a centralized and tightly coupled DDL.
Neither model has encapsulation capabilities. In XML, encapsulation is impossible because of its open schema. Encapsulation is also unachievable and not enforced by SQL since data changes only by UPDATE DML. Unlike XML and SQL, objects and components thrive on encapsulation. By its very nature, encapsulation prevents the arbitrary comparison of any data since the data is not visible. Therefore, it is impossible to perform queries. Extensibility and independent definition are also impossible in this model since encapsulation implies that the schema is concealed.
In summary, XML thrives in the world of communicating requests, responses, and reference data between services. It provides all of the functionality, scalability and granularity required by messages. In term of storing data, SQL database is a leader offering many outstanding benefits like storing incoming and outgoing messages. Its capabilities are further bolstered when utilized for audits, compliance, or business intelligence. When building a service, objects are recommended because encapsulation facilitates the construction of software.
Figure 13. XML, SQL, and objects working together
In the end, this information is not presented to lobby for one model over another. Instead, this information is offered to illustrate a fascinating fact ?each model's strength is at the same time its weakness. It is important to recognize, however, all three models are critical in a Service Oriented Architecture.
Conclusion
结论
This paper examines an intricate part of SOA, data inside a service and data sent outside the service boundary. A discussion of the roles and relationships between services, messages, and different technology models were explored to illustrate the difference between the two kinds of data.
这个文章解释了SOA的intricate部分,服务内外不数据。讨论了服务和消息之间角色和关系。研究了不同的技术模型来解释这两种数据之间的区别。
Services are connected only by messages otherwise; they are independent and behave differently from each other. The messages sent between services describe a business function and contain operands that commonly express parameters or options for the operation. This is a much freer and less intimate relationship than traditional distributed systems. Because services are different, atomic database transactions are not shared across service boundaries. The uncertainty of joint decisions is handled by the business logic of the interacting services.
服务只能使用消息连接。他们是独立的,项目的行为不同,消息在服务间传送,来描述业务功能,它包括操作数-一般是参数或者操作的选项。这比传统的分布式系统更自由,减少了相互之间的紧密关系。因为服务是不同的,原子数据库事务不能跨越服务界限共享。连接的不确定性通过交互的业务逻辑来解决。
Messages are also different. Once a message is outside the service boundary, it must be immutable. In addition, special attention must be given to the stability of the data. Stability ensures the data is understood across space and time.
消息也是不同的。消息一旦在服务之外,它不惜是不可变的。另外,要给予数据稳定性足够的重视。稳定性确保数据跨越时间空间都可以理解。
Reference data plays an important role in messaging. It refers to data published across service boundaries. Operands, historic artifacts, and shared collections are all types of reference data. Operands are parameters or options used to create operator requests sent out for processing. Historic artifacts describe past events that took place inside the publishing service. Shared collections of data are used by multiple services. They are also constantly evolving. As a result, partner services usually only have access to a recent version of the data.
引用数据在消息传送方面起到了重要的作用。它指跨越服务界限发布的数据。操作数,历史数据和共享解释是引用数据的全部类型。操作数和参数或者选项用来创建操作请求,发送给处理程序。历史数据描述过去发生在发布服务内部的事件。共享数据集合被多个服务使用,他们还是不断变化的。结果是partnent服务经常获得最近版本的服务。
When discussing the representation of data inside and outside of services, SQL, XML and Objects all have a worthy place. Interestingly, the essence of what makes one of these models strong in one area of use also makes it weak in another area of use. This is the reason for the longevity of the differences across the communities of specialists in data representation.
当讨论服务内外部数据表示的时候,SQL,XML,对象都有他们 worthy place。有趣的是,这些模型在一个方面有优势,那么他们必然在其他方面有弱点。This is the reason for the longevity of the differences across the communities of specialists in data representation.
Finally, data is different outside a service from the inside of a service. Data on the inside is described as living in the "now." The data is usually private to the service and encapsulated by service code. On-the-other-hand, data on the outside lives in the past. It is passed in messages and understood by both the sender and receiver.
最后,服务内外部的数据是不同的。服务内部数据被描述为“现在”,这些数据一般是对服务私有的,被 服务代码封装好的。另一方面,服务外部数据表示“过去”,它传送服务双方都理解的信息。
About the author
Pat Helland has 25 years of experience in the software industry and has been an architect at Microsoft since 1994. He has worked for more than 20 years in database, transaction processing, distributed systems, as well as fault tolerant and scalable systems. Pat worked at Tandem Computers designed TMF (Transaction Monitoring Facility). He was one of the founders of the team that implemented and shipped Microsoft Transaction Server (MTS), now COM+. Pat has recently focused his thinking on loosely coupled application environments. Pat can be reached at phelland@microsoft.com.