1.4 Current Limitations
当前局限
Limitations on Jobs which can Checkpointed
对于可校验任务的限制
Although Condor can schedule and run any type of process, Condor does have some limitations on jobs that it can transparently checkpoint and migrate:
尽管Condor能编排和运行任何类型的进程,但对于那些可以自动建立校验点和迁移的任务来说Condor还是有一些限制的:
1. Multi-process jobs are not allowed. This includes system calls such as fork(), exec(), and system().
多进程任务不可以。这包括诸如fork(),exec(),和system()等系统调用。
2. Interprocess communication is not allowed. This includes pipes, semaphores, and shared memory.
进程内通讯不可以。这包括管道,信号,和共享内存。
3. Network communication must be brief. A job may make network connections using system calls such as socket(), but a network connection left open for long periods will delay checkpointing and migration.
网络通讯必须简短。一个任务可以使用诸如socket()等系统调用创建网络连接,但是一个长时间的网络连接将对建立校验点和进程迁移造成延误。
4. Sending or receiving the SIGUSR2 or SIGTSTP signals is not allowed. Condor reserves these signals for its own use. Sending or receiving all other signals is allowed.
发送或接收SIGUSR2或SIGTSTP信号不可以。Condor将这些信号留给自己使用。发送或接收其它信号则都是可以的。
5. Alarms, timers, and sleeping are not allowed. This includes system calls such as alarm(), getitimer(), and sleep().
警报,计时器,和休眠都不可以。这包括诸如alarm(),getitimer(),和sleep()等系统调用。
6. Multiple kernel-level threads are not allowed. However, multiple user-level threads are allowed.
多个内核级线程不可以。然而,多个用户级线程是可以的。
7. Memory mapped files are not allowed. This includes system calls such as mmap() and munmap().
内存映射文件不可以。这包括诸如mmap()和munmap()等系统调用。
8. File locks are allowed, but not retained between checkpoints.
文件锁可以,但是对两个校验点之间的文件锁不加保留。
9. All files must be opened read-only or write-only. A file opened for both reading and writing will cause trouble if a job must be rolled back to an old checkpoint image. For compatibility reasons, a file opened for both reading and writing will result in a warning but not an error.
所有文件必须按只读或只写打开。当一个任务必须回退到某个旧的校验映像的时候,如果以可读写方式打开文件就会导致麻烦。出于兼容性的考虑,文件按读写方式打开文件会引发一条警告而非一条错误。
10. A fair amount of disk space must be available on the submitting machine for storing a job's checkpoint images. A checkpoint image is approximately equal to the virtual memory consumed by a job while it runs. If disk space is short, a special checkpoint server can be designated for storing all the checkpoint images for a pool.
在提交任务的机器上需要一块相当大的磁盘空间用来保存任务的校验点映像。一个校验点映像近似于由一项任务在运行期间所消耗的虚拟内存。如果磁盘空间短缺,那么可以指派一个专用校验点服务器来存储某个机群中的所有校验点映像。
11. On Digital Unix (OSF/1), HP-UX, and Linux, your job must be statically linked. Dynamic linking is allowed on all other platforms.
在Digital Unix (OSF/1),HP-UX,和Linux平台上,任务必须被静态连接。在其它平台上则允许动态连接。
Note: these limitations only apply to jobs which Condor has been asked to transparently checkpoint. If job checkpointing is not desired, the limitations above do not apply.
注意:这些限制仅适用于那些让Condor进行自动校验的任务。如果并未要求任务校验,那么上述限制就不起作用。
Security Implications.
安全考量
Condor does a significant amount of work to prevent security hazards, but loopholes are known to exist. Condor can be instructed to run user programs only as the UNIX user nobody, a user login which traditionally has very restricted access. But even with access solely as user nobody, a sufficiently malicious individual could do such things as fill up /tmp (which is world writable) and/or gain read access to world readable files. Furthermore, where the security of machines in the pool is a high concern, only machines where the UNIX user root on that machine can be trusted should be admitted into the pool. Condor provides the administrator with extensive security mechanisms to enforce desired policies.
Condor为预防安全隐患作了大量的工作,但是难免会百密一疏。Condor可以被设定成只能以UNIX的nobody用户身份运行用户程序,这个用户帐号通常只有非常受限的访问权限。但是即便只作为nobody用户,心怀叵测之徒还是能有机可乘的作一些坏事比如填满/tmp(这个目录是公共可写的)并且/或者得到公共可读文件的读权限。此外,鉴于机群中机器的安全性关系重大,只有那些其UNIX的root用户可以被信任的机器才可以纳入机群当中。Condor向管理员提供充分的安全机制用以确保相关保护策略的实施。
Jobs Need to be Re-linked to get Checkpointing and Remote System Calls
任务需要被重连接才能进行校验和远程系统调用
Although typically no source code changes are required, Condor requires that the jobs be re-linked with the Condor libraries to take advantage of checkpointing and remote system calls. This often precludes commercial software binaries from taking advantage of these services because commercial packages rarely make their object code available. Condor's other services are still available for these commercial packages.
尽管通常不需要更改源代码,但Condor需要让任务重连接到Condor库上以次获得校验和远程系统调用功能。这意味着商业软件程序常常被排除在这些功能之外,因为商业程序是很少会开放其代码的。不过对于商业产品来说Condor的其它功能仍然是行之有效的。
未经作者允许,请勿转载译文