1.2 Condor's Power
Condor的功用
Condor is a software system that creates a High-Throughput Computing (HTC) environment. It effectively utilizes the computing power of workstations that communicate over a network. Condor can manage a dedicated cluster of workstations. Its power comes from the ability to effectively harness non-dedicated, preexisting resources under distributed ownership.
Condor是一个能创建高吞吐量计算(HTC)环境的软件系统。它能有效利用通过网络互连的众多工作站的运算能力。Condor还可以管理一个专用的工作站集群。它的这种能力来自于其可以在分布式所有权环境下有效利用零散的现有资源的机制。
A user submits the job to Condor. Condor finds an available machine on the network and begins running the job on that machine. Condor has the capability to detect that a machine running a Condor job is no longer available (perhaps because the owner of the machine came back from lunch and started typing on the keyboard). It can checkpoint the job and move (migrate) the jobs to a different machine which would otherwise be idle. Condor continues job on the new machine from precisely where it left off.
当某个用户向Condor提交任务。Condor会从网络中找到一台可用的机器并在这台机器上运行该任务。Condor能够侦测出一台正在运行某项Condor任务的机器不再有效(也许是因为机器的主人吃完午饭回来并触动了键盘)。Condor能够对任务建立校验点并将其迁移到另一台空闲的机器上面。然后Condor在新机器上从刚才中止的地方继续执行这项任务。
In those cases where Condor can checkpoint and migrate a job, Condor makes it easy to maximize the number of machines which can run a job. In this case, there is no requirement for machines to share file systems (for example, with NFS or AFS), so that machines across an entire enterprise can run a job, including machines in different administrative domains.
对于允许Condor对任务建立校验点并实施迁移的情形,Condor可以很容易的使运行这项任务的机器数量达到最多。对于这种情况,并不要求这些机器共享文件系统(例如,通过NFS或AFS),因此整个企业的机器都能被用来运行某项任务,包括那些位于不同管理区域的机器。
Condor can be a real time saver when a job must be run many (hundreds of) different times, perhaps with hundreds of different data sets. With one command, all of the hundreds of jobs are submitted to Condor. Depending upon the number of machines in the Condor pool, dozens or even hundreds of otherwise idle machines can be running the job at any given moment.
当某项任务必须运行很多次(上百次),或许还要使用上百个不同的数据集的时候,Condor能够实实在在的节省时间。只需一条命令,所有这几百项任务就都提交给了Condor。根据Condor机群中的机器数量,十几台乃至上百台空闲的机器就能用来在任何指定的时间运行这项任务。
Condor does not require an account (login) on machines where it runs a job. Condor can do this because of its remote system call technology, which traps library calls for such operations as reading or writing from disk files. The calls are transmitted over the network to be performed on the machine where the job was submitted.
Condor不要求运行任务的机器上具备某个(登陆)账号。Condor可以通过其远程系统调用技术来实现这一点,远程调用会捕获诸如从磁盘读写这类操作的库调用。这些调用将通过网络传输到提交任务的机器上面加以执行。
Condor provides powerful resource management by match-making resource owners with resource consumers. This is the cornerstone of a successful HTC environment. Other compute cluster resource management systems attach properties to the job queues themselves, resulting in user confusion over which queue to use as well as administrative hassle in constantly adding and editing queue properties to satisfy user demands. Condor implements ClassAds, a clean design that simplifies the user's submission of jobs.
通过在资源拥有者和资源消费者之间创建匹配,Condor提供了强大的资源管理。这正是一个高性能HTC环境的构建基础。其它计算集群资源管理系统往往要给任务队列添加某些属性,这会导致用户对于该使用哪一个队列迷惑不清,而通过频繁的添加和编辑队列属性来满足用户需求也会造成管理上的混乱。Condor则通过实现ClassAd这样一个清晰的设计简化了用户的任务提交过程。
ClassAds work in a fashion similar to the newspaper classified advertising want-ads. All machines in the Condor pool advertise their resource properties, both static and dynamic, such as available RAM memory, CPU type, CPU speed, virtual memory size, physical location, and current load average, in a resource offer ad. A user specifies a resource request ad when submitting a job. The request defines both the required and a desired set of properties of the resource to run the job. Condor acts as a broker by matching and ranking resource offer ads with resource request ads, making certain that all requirements in both ads are satisfied. During this match-making process, Condor also considers several layers of priority values: the priority the user assigned to the resource request ad, the priority of the user which submitted the ad, and desire of machines in the pool to accept certain types of ads over others.
ClassAd的工作方式类似于报纸上的分类供求广告。Condor机群中的所有机器都会在一个资源供应广告中申明各自静态和动态的资源储备,比如可用的RAM内存,CPU类型,CPU速度,虚拟内存大小,物理位置,以及系统平均负载。一个用户在提交某项任务的时候会制定一个资源需求广告。这份需求书定义了运行这项任务所需的必要和最佳资源用量。Condor像个经济人一样在资源供应广告和资源需求广告之间进行匹配和分配,确保供需双方的要求都能得到满足。在匹配过程进行期间,Condor也会考虑几种优先级层次:用户赋予资源需求广告的优先级,提交广告的用户本身的优先级,还有机群中的机器对于某类广告的偏爱。
未经作者允许,请勿转载译文