1.1 High-Throughput Computing (HTC) and its Requirements
高吞吐量计算(HTC)及其必备条件
For many research and engineering projects, the quality of the research or the product is heavily dependent upon the quantity of computing cycles available. It is not uncommon to find problems that require weeks or months of computation to solve. Scientists and engineers engaged in this sort of work need a computing environment that delivers large amounts of computational power over a long period of time. Such an environment is called a High-Throughput Computing (HTC) environment. In contrast, High Performance Computing (HPC) environments deliver a tremendous amount of compute power over a short period of time. HPC environments are often measured in terms of FLoating point Operations Per Second (FLOPS). A growing community is not concerned about operations per second, but operations per month or per year. Their problems are of a much larger scale. They are more interested in how many jobs they can complete over a long period of time instead of how fast an individual job can complete.
对于很多研究和工程项目,研究或产品的最终质量高度依赖于可用计算资源的数量。往往有工作需要耗时几周乃至几个月的计算来寻找问题的解决方案。从事此类项目的科学家和工程师们需要一个能在较长时间内提供大量运算能力的计算环境。这样的一个环境被称作高吞吐量计算(HTC)环境。与之相对照,高性能计算(HPC)环境可以在较短时间内提供庞大的运算能力。HPC环境通常以每秒浮点运算次数(FLOPS)加以度量。目前有一个正不断发展的计算社群,他们所关注的并非每秒的运算次数而是每月或者每年的运算次数。他们所面对的问题规模相当大。他们更关心的是在一段比较长的时间内到底可以完成多少任务而不是某一项任务究竟能多快完成。
The key to HTC is to efficiently harness the use of all available resources. Years ago, the engineering and scientific community relied on a large, centralized mainframe or a supercomputer to do computational work. A large number of individuals and groups needed to pool their financial resources to afford such a machine. Users had to wait for their turn on the mainframe, and they had a limited amount of time allocated. While this environment was inconvenient for users, the utilization of the mainframe was high; it was busy nearly all the time.
HTC的关键在于有效利用所有的可用资源。数年前,工程和科学社群还要依赖一台大型中央主机或者一台超级计算机来完成计算工作。大量的个人和群体需要汇集他们的商业经费来负担这样的一台机器。用户不得不轮流等待着主机的使用权,而分配到的时间数量却十分有限。这样的环境对用户来说实在不方便,不过主机的利用率倒是很高;几乎永远处于忙碌状态。
As computers became smaller, faster, and cheaper, users moved away from centralized mainframes and purchased personal desktop workstations and PCs. An individual or small group could afford a computing resource that was available whenever they wanted it. The personal computer is slower than the large centralized machine, but it provides exclusive access. Now, instead of one giant computer for a large institution, there may be hundreds or thousands of personal computers. This is an environment of distributed ownership, where individuals throughout an organization own their own resources. The total computational power of the institution as a whole may rise dramatically as the result of such a change, but because of distributed ownership, individuals have not been able to capitalize on the institutional growth of computing power. And, while distributed ownership is more convenient for the users, the utilization of the computing power is lower. Many personal desktop machines sit idle for very long periods of time while their owners are busy doing other things (such as being away at lunch, in meetings, or at home sleeping).
因为当今的计算机已经变得更小,更快,更便宜,所以用户从使用中央主机转向购买个人台式工作站和PC。个人或小群体能够在任何需要的时候负担这样的一个计算资源。虽然个人电脑比大型机来得慢,但是它提供了独享的访问权限。现今,取代了只有一台巨型机的情形,一个大型机构里可能会有成百上千的个人电脑。这是一个分布式所有权的环境,也就是组织中的各个人员都拥有属于自己的资源。如此一来机构的总体运算能力可能得到了显著的增强,但是由于分布式的所有权,个人并未从所在机构的运算能力增长中获得好处。另外,尽管分布式所有权对于用户来说更加方便,但运算能力的真正使用率却很低。很多个人台式机会长时间的空闲而它们的主人却忙于其它事情(比如外出午餐,开会,或者在家睡觉)。
未经作者允许,请勿转载译文