時刻と時間の管理

情報学類 オペレーティングシステム II					2008年02月05日

                                       筑波大学システム情報工学研究科
                                       コンピュータサイエンス専攻, 電子・情報工学系
                                       新城 靖
                                       <yas@is.tsukuba.ac.jp>

このページは、次の URL にあります。
http://www.coins.tsukuba.ac.jp/~yas/coins/os2-2007/2008-02-05
あるいは、次のページから手繰っていくこともできます。
http://www.coins.tsukuba.ac.jp/~yas/
http://www.cs.tsukuba.ac.jp/~yas/

■復習

システムプログラム
- gettimeofday(), time(), struct timeval, time_t t
- asctime(), ctime(), gmtime(), localtime(), mktime(), struct tm
- SIGALRM, SIGALRM, SIGPROF, setitimer()

■参考文献

VMware: "Timekeeping in VMware Virtual Machines", White paper (2005).

高橋浩和, 小田逸郎, 山幡為佐久: "Linuxカーネル2.6解読室",フトバンククリエイティブ (2006/11/18). ISBN-13: 978-4797338263.

Marshall Kirk McKusick, George V. Neville‐Neil (著), 歌代和正, 砂原秀樹(翻訳): "BSDカーネルの設計と実装―FreeBSD詳解", アスキー (2005/10/18). ISBN-13: 978-4756146793. Marshall Kirk McKusick, George V. Neville-Neil: "The Design And Implementation Of The Freebsd Operating System Addison-Wesley (2004/7/30). ISBN-13: 978-0201702453

■オペレーティング・システム

２つのインタフェースを、うまくつなぐもの。

図? オペレーティング・システムの働き

どんなシステム・コールを提供すべきか
どんなハードウェアが利用できるのか
(内部で必要とされる機能)

ライブラリとシステム・コールの違いに注意

■時刻と時間

◆必要な機能

カレンダ時刻の提供, gettimeofday(), time()
インターバルタイマ, setitimer()
時間切れ, select(), poll()
カーネル内の仕事。時間刻み(time slice)の実現、ラウンドロビン・スケジューリング実時間スケジューリング。アカウンティング。

◆カレンダ時刻の提供

struct timeval {
	long    tv_sec;         /* seconds since Jan. 1, 1970 */
	long    tv_usec;        /* and microseconds */
};

int gettimeofday(struct timeval *tp, struct timezone *tzp)

struct timeval  tv;
gettimeofday(&tv, NULL);

μ(マイクロ)秒単位の時刻を返す。POSIX 1003.1, 2003 では、ナノ秒単位。

struct timespec  {
    time_t tv_sec;            /* Seconds.  */
    long int tv_nsec;         /* Nanoseconds.  */
};

int clock_settime(clockid_t clock_id, const struct timespec *tp);
int clock_gettime(clockid_t clock_id, struct timespec *tp);
int clock_getres(clockid_t clock_id, struct timespec *res);

◆インターバルタイマ

定期的な仕事をしたい時に使う

struct itimerval {
    struct timeval it_interval; /* next value */
    struct timeval it_value;    /* current value */
};

int setitimer(int which, const struct itimerval *value, 
    struct itimerval *ovalue);

まず、it_value で指定された時間が経つと、シグナルが通知される。
以後、it_interval 毎にシグナルが通知される。

同じ話。

内部にカウントダウンしていく変数がある。
初期値として、it_value を設定する。
変数が 0 になると、シグナルが通知される。
変数の値を、it_interval に設定する。
3. を、解除されるまで繰り返す。

◆時間切れ

int  select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
     		struct timeval *timeout);
int  poll(struct pollfd *fds, nfds_t nfds, int timeout);

ネットワーク・プログラムでよく使う。複数の入力を監視する。指定された時間、入力がなければ、システム・コールから復帰する。

なにもしない時間切れ。

unsigned int sleep(unsigned int seconds);
int usleep(useconds_t usec)
int nanosleep(const struct timespec *rqtp, struct timespec *rmtp);

◆シグナルとハードウェア割込み

Unix のシグナルは、ハードウェア割込みを元に機能が設計されている。

種類が小数、固定的な利用
引数がない
プロセス全体で大域的（スレッドに割込みは合わない）
割込み禁止(割込みマスク)という考え方がある

up-call の方が、ソフトウェアとしては使いやすい。

◆up-call

図? システムコールとアップコール

アップコールは、システム・コールの逆に、オペレーティング・システムがユーザ・レベルの関数を呼び出す。ウインドウ・システムのプログラムでは、call-back 関数やlistener として利用されている。

■時刻・時間関連のハードウェア

コンピュータによって違う。ここでは、PC/AT の話。

◆基本的なモデル

ハードウェアは、簡単。発振器とカウントダウンするカウンタを使う。

図? タイマ関連のハードウェアの基本モデル

発振器は、一定の周波数でカウンタに信号を送る。
発振器の周波数は、ソフトウェアで設定可能な場合がある。
カウンタは、ソフトウェアで読み書きできる。
カウンタは、０になったら、信号を生成する。それにより、 CPU に割込みがかかる。
カウンタのモードが一回切り(one shot mode)なら、これで終わり。
カウンタのモードが定期的(periodic mode)なら、別のレジスタから値を再設定する。

カウントダウンではなくカウントアップするものもある。

◆PIT (Programmable Interval Timer)

古いタイマ・デバイス。

発振器の周波数は、 1193182Hz。 NTSC 信号の 1/3。当時、実装が容易。ソフトウェアからは使いにくい。
カウンタ、再設定用のレジスタは、16ビット。
PIT デバイスは、3 つのタイマを持つ。
- 0: 割込み生成可能。OSの時間管理に適する。
- 1: Dynamic RAM のリフレッシュ。
- 2: スピーカーのトーン生成

Linux は、0 番を主タイマとして利用している。

◆CMOS RTC (Real Time Clock)

電源オフ時にも、バッテリで動作している。BIOS 用の設定を保持するメモリの一部。当時、CMOS は、「低消費電力」用としてだけ使われていた。今は、普通。　

2つの機能がある。

TOD clock: 時刻を、year/month/day hour:minute:second という形式で持つ。秒以下は読めない。
定期的な割込み用: 2Hz から 8192Hz の範囲で、2 の冪乗の周期で割込みを起こせる。

制約

カウンタの値を読み書きできない
初期値は、2 の冪乗しか設定できない。

その他の割込み

TOD クロックが更新された。毎秒１回。
目覚まし。TOD クロックが設定された時刻と一致した時。

◆Local APIC (Advanced Programmable Interrupt Controller) Timers

Local APIC は、割込みのルーティングに使われる。マルチプロセッサでも、CPU 毎に独立。 Pentium 以降では、CPU に内蔵。

カウンタと初期値は 32 ビット。
入力周波数は、CPU のフロントサイド・メモリ・バスの周波数と同じ。 (DDR 用以前)
PIT や CMOS RTC より高精度だが、ソフトウェアから周波数を調べる方法がない。 (PIT や CMOS RTCを使って調べるしかない。)

◆ACPI (Advanced Configuration and Power Interface)

チップセット・タイマとも呼ばれる。

発振器は、3579545 Hz (PIT の 3 倍)
カウンタは、24ビット。
初期値を保持するレジスタがない。24 ビットで一周する。
上位ビットが変化すると割込みがかかる。
省電力モードでも、通常通り動き続ける

Linux kernel 2.6 は、PIT を補正するのに使う。

◆TSC (Time Stamp Counter)

Pentium 以降で利用可能。

発振器は、CPU クロック。高精度。
カウンタは、64 ビット。一周するのに、数年かかる。
割り込みを起こす機能はない。
初期値を保持するレジスタがない
ソフトウェアから読み出せる。(Pentium 4 では、遅い。)
ユーザからもアクセス可能にするのが普通。
ソフトウェアから周波数を調べる方法がない。 (PIT や CMOS RTCを使って調べるしかない。)
省電力で CPU の周波数を落すことがある。
SMP でも、CPU は同じ発振器の出力を使うので同じように増える。NUMA では動かない。

Linux の gettimeofday() で、PIT 以上の精度を出す時に使う。

■Liux における時刻・時間管理

◆jiffies変数

インターバル・タイマを使って、jiffies という変数を増やす。１回の割込みを tick という。
伝統的な Unix では、1秒間に 100 回-120 回のタイマ割込みを起こす。
Linux では、HZ というコンパイル時の定数で設定される。 100, 250, 1000 などに設定するのが普通。

/* kernel/timer.c */
u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES;
...
void do_timer(unsigned long ticks)
{
        jiffies_64 += ticks;
        update_times(ticks);
}

/* kernel/time/tick-common.c */
static void tick_periodic(int cpu)
{
...
                do_timer(1);
...
        update_process_times(user_mode(get_irq_regs()));
...
}
void tick_handle_periodic(struct clock_event_device *dev)
{
...
        tick_periodic(cpu);
...
}

/* include/asm-x86/i8253.h */
extern struct clock_event_device *global_clock_event;

/* include/asm-x86/mach-default/do_timer.h */
static inline void do_timer_interrupt_hook(void)
{
        global_clock_event->event_handler(global_clock_event);
}

/* arch/x86/kernel/time_32.c */
irqreturn_t timer_interrupt(int irq, void *dev_id)
{
...
        do_timer_interrupt_hook();
...
}

/* arch/x86/mach-default/setup.c */
static struct irqaction irq0  = {
        .handler = timer_interrupt,
        .flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL,
        .mask = CPU_MASK_NONE,
        .name = "timer"
};
...
void __init time_init_hook(void)
{
        irq0.mask = cpumask_of_cpu(0);
        setup_irq(0, &irq0);
}

◆xtime変数

xtime は、1970年1月1日00:00:00(GMT)からの秒数。gettimeofday() で返される値を nano 秒単位で保持している。実際の精度は、jiffies 単位だが、それより細かい単位を TSC 等の clocksource から読み込み補正する。

/* kernel/time.c */
asmlinkage long sys_gettimeofday(struct timeval __user *tv, struct timezone __user *tz)
{
        if (likely(tv != NULL)) {
                struct timeval ktv;
                do_gettimeofday(&ktv);
                if (copy_to_user(tv, &ktv, sizeof(ktv)))
                        return -EFAULT;
        }
        if (unlikely(tz != NULL)) {
                if (copy_to_user(tz, &sys_tz, sizeof(sys_tz)))
                        return -EFAULT;
        }
        return 0;
}

/* kernel/time/timekeeping.c */

struct timespec xtime __attribute__ ((aligned (16)));
...
void do_gettimeofday(struct timeval *tv)
{
        struct timespec now;

        __get_realtime_clock_ts(&now);
        tv->tv_sec = now.tv_sec;
        tv->tv_usec = now.tv_nsec/1000;
}

static inline void __get_realtime_clock_ts(struct timespec *ts)
{
        unsigned long seq;
        s64 nsecs;

        do {
                seq = read_seqbegin(&xtime_lock);

                *ts = xtime;
                nsecs = __get_nsec_offset();

        } while (read_seqretry(&xtime_lock, seq));

        timespec_add_ns(ts, nsecs);
}

static inline s64 __get_nsec_offset(void)
{
        cycle_t cycle_now, cycle_delta;
        s64 ns_offset;

        /* read clocksource: */
        cycle_now = clocksource_read(clock);

        /* calculate the delta since the last update_wall_time: */
        cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;

        /* convert to nanoseconds: */
        ns_offset = cyc2ns(clock, cycle_delta);

        return ns_offset;
}

static void update_wall_time(void)
{
... 
                        xtime.tv_sec++;
...	
        xtime.tv_nsec = (s64)clock->xtime_nsec >> clock->shift;
...
}

/* kernel/timer.c */
static inline void update_times(unsigned long ticks)
{
        update_wall_time();
        calc_load(ticks);
}

◆time slice

/* kernel/sched.c */
void scheduler_tick(void)
{
        struct rq *rq = cpu_rq(cpu);
        struct task_struct *curr = rq->curr;
...
        curr->sched_class->task_tick(rq, curr);
...
}

/* kernel/timer.c */
void update_process_times(int user_tick)
{
...
	scheduler_tick();
...
}

/* kernel/sched_fair.c */
static void task_tick_fair(struct rq *rq, struct task_struct *curr)
{
        struct cfs_rq *cfs_rq;
        struct sched_entity *se = &curr->se;

        for_each_sched_entity(se) {
                cfs_rq = cfs_rq_of(se);
                entity_tick(cfs_rq, se);
        }
}

static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
        /*
         * Update run-time statistics of the 'current'.
         */
        update_curr(cfs_rq);

        if (cfs_rq->nr_running > 1 || !sched_feat(WAKEUP_PREEMPT))
                check_preempt_tick(cfs_rq, curr);
}

static void update_curr(struct cfs_rq *cfs_rq)
{
        struct sched_entity *curr = cfs_rq->curr;
        u64 now = rq_of(cfs_rq)->clock;
        unsigned long delta_exec;
        delta_exec = (unsigned long)(now - curr->exec_start);
        __update_curr(cfs_rq, curr, delta_exec);
        curr->exec_start = now;
...
}

static inline void
__update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr,
              unsigned long delta_exec)
{
...
        curr->sum_exec_runtime += delta_exec;
        delta_exec_weighted = delta_exec;
...
        curr->vruntime += delta_exec_weighted;
        if (first_fair(cfs_rq)) {
                vruntime = min_vruntime(curr->vruntime,
                                __pick_next_entity(cfs_rq)->vruntime);
        } else
                vruntime = curr->vruntime;

        cfs_rq->min_vruntime =
                max_vruntime(cfs_rq->min_vruntime, vruntime);
}

static void
check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
        unsigned long ideal_runtime, delta_exec;

        ideal_runtime = sched_slice(cfs_rq, curr);
        delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
        if (delta_exec > ideal_runtime)
                resched_task(rq_of(cfs_rq)->curr);
}

CPU スライスを使い果たしたら、resched_task() で、次のプロセス(task)に切り替える。

◆Unix callout待ち行列

n tick 後に、関数を実行する仕組み。伝統的な Unix での実装では、callout 待ち行列と呼ばれる仕組みを使う。Linux は、少し違う。

個々の要素に、n tick 後に実行すべきかの値を置く。
先に実行する関数を、キューの先頭に近い場所に置く。tick は、差分で置く。
tick ごとに、先頭の値を減らしていく。
0 になったら、キューから取り除いて呼び出す

◆例: 0 tick 後

10 tick 後に f(x) を実行
200 tick 後に g(y) を実行
200 tick 後に h(z) を実行

図? callout 待ち行列(1)

◆例: 10 tick 後

f(x) を取り除いて実行した後。

190 tick 後に g(y) を実行
190 tick 後に h(z) を実行

図? callout 待ち行列(2)

◆例: 200 tick 後

g(y), h(z) を取り除いて実行した後。

図? callout 待ち行列(3)

◆Linux タイマリスト

include/linux/timer.h:

struct timer_list {
        struct list_head entry;
        unsigned long expires;

        void (*function)(unsigned long);
        unsigned long data;

        struct tvec_t_base_s *base;
};

jiffies が増加して expires (絶対時刻)に達すれば、(*function)(data) を呼ぶ。

主に次の関数で操作する。 include/linux/timer.h:

static inline void add_timer(struct timer_list *timer)
{
        BUG_ON(timer_pending(timer));
        __mod_timer(timer, timer->expires);
}

利用例。

    timer->expires = jiffies + delay;
    timer->data = (unsigned long)data;
    timer->function = func;
    add_timer(timer);

add_timer(), add_timer_on(): イベントの登録。_on() は、CPU ごと。
mod_timer(): イベントの起動時刻の変更
del_timer(), del_timer_sync(), try_to_del_timer_sync(): イベントのキャンセル

◆Linux tv1..tv5

内部的には、timer_list は、tv1からtv5の5つの表に分解されて保存されている。

図? Linux tv

tv1 は、256 (TVR_SIZE) エントリ、tv2..tv5 は、64 エントリ (TVN_SIZE)。
tv1 は、各 tick に対応している。
タイマ割り込みで、jiffies % 256 にある time_list のイベントを実行する。
一周したら、tv2 にあるイベントを tv1 に移動させる。

/* timer.c: */
static void run_timer_softirq(struct softirq_action *h)
{
        tvec_base_t *base = __get_cpu_var(tvec_bases);

        hrtimer_run_queues();

        if (time_after_eq(jiffies, base->timer_jiffies))
                __run_timers(base);
}

static inline void __run_timers(tvec_base_t *base)
{
        struct timer_list *timer;
...
        while (time_after_eq(jiffies, base->timer_jiffies)) {
                struct list_head work_list;
                struct list_head *head = &work_list;
                int index = base->timer_jiffies & TVR_MASK;
...
                ++base->timer_jiffies;
                list_replace_init(base->tv1.vec + index, &work_list);
                while (!list_empty(head)) {
                        void (*fn)(unsigned long);
                        unsigned long data;
..
                        timer = list_entry(head->next,struct timer_list,entry);
                        fn = timer->function;
                        data = timer->data;
...
                                fn(data);
		}
	}
}

◆do_setitimer()

/* kernel/itimer.c */
int do_setitimer(int which, struct itimerval *value, struct itimerval *ovalue)
{
        struct task_struct *tsk = current;
        struct hrtimer *timer;
...
        switch (which) {
        case ITIMER_REAL:
...
                timer = &tsk->signal->real_timer;
...
                tsk->signal->it_real_incr =
                        timeval_to_ktime(value->it_interval);
                expires = timeval_to_ktime(value->it_value);
                if (expires.tv64 != 0)
                        hrtimer_start(timer, expires, HRTIMER_REL);
...
        }
}

◆High-resolution kernel timers

用途

インターバルタイマ
POSIX タイマ
nanosleep()
より細かいカーネル内のスケジューリング

hrtimer の管理には、木構造が使われている。O(log n)。

/* kernel/hrtimer.c */
int
hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
{
...
	base = lock_hrtimer_base(timer, &flags);
	new_base = switch_hrtimer_base(timer, base);
        timer->expires = tim;
        enqueue_hrtimer(timer, new_base);
}

static void enqueue_hrtimer(struct hrtimer *timer,
                            struct hrtimer_clock_base *base, int reprogram)
{
        struct rb_node **link = &base->active.rb_node;
        while (*link) {
                parent = *link;
                entry = rb_entry(parent, struct hrtimer, node);
                if (timer->expires.tv64 < entry->expires.tv64) {
                        link = &(*link)->rb_left;
                } else {
                        link = &(*link)->rb_right;
                        leftmost = 0;
                }
        }
        if (leftmost) {
...
                base->first = &timer->node;
        }
        rb_link_node(&timer->node, parent, link);
...
}

タイマ割込みの run_timer_softirq() から呼ばれる。

void hrtimer_run_queues(void)
{
...
        for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++)
                run_hrtimer_queue(cpu_base, i);
}

static inline void run_hrtimer_queue(struct hrtimer_cpu_base *cpu_base,
                                     int index)
{
        while ((node = base->first)) {
                struct hrtimer *timer;
                enum hrtimer_restart (*fn)(struct hrtimer *);
                int restart;
...
                fn = timer->function;
                __remove_hrtimer(timer, base, HRTIMER_STATE_CALLBACK, 0);
                restart = fn(timer);
...
                timer->state &= ~HRTIMER_STATE_CALLBACK;
                if (restart != HRTIMER_NORESTART) {
                        enqueue_hrtimer(timer, base, 0);
                }
        }
}

Last updated: 2008/02/04 22:23:42

Yasushi Shinjo / <yas@is.tsukuba.ac.jp>