SunRPCとNFS

情報学類 分散システム					2007年01月30日

                                       筑波大学システム情報工学研究科
				       コンピュータサイエンス専攻, 電子・情報工学系
                                       新城 靖
                                       <yas@is.tsukuba.ac.jp>

このページは、次の URL にあります。
http://www.coins.tsukuba.ac.jp/~yas/coins/dsys-2006/2007-01-30
あるいは、次のページから手繰っていくこともできます。
http://www.coins.tsukuba.ac.jp/~yas/
http://www.cs.tsukuba.ac.jp/~yas/

■SunRPC

SunRPC は、Sun Microsystems社により開発され、仕様やソースコードが公開された RPC の実装。ONC RPC (Open Network Computing) とも呼ばれる。 RFC にもなっている。

◆rpcgenコマンド

SunRPC のスタブ生成器は、

rpcgen

というコマンド。

◆rpcgenコマンドとファイル

rpcgen コマンドを使うには、次のようなファイルを作成する。

図? rpcgenによるRPCプログラム開発で利用するファイル

name.x: インタフェースを記述。
name_client.c: クライアント側の main プログラム。
name_server.c: サーバ側で、RPC で呼び出されるプログラム。 (main()関数は、rpcgen により自動生成される。)

◆rpcgenコマンドの使い方

% rpcgen name.x

次の４つのファイルが生成される。

name.h: そのRPCのプログラムで使う定数、データ構造、スタブ手続きのインタフェース。
name_clnt.c: クライアント側のスタブ。
name_xdr.c: name.x で定義したデータ構造について、 XDR のための手続き(整列化と非整列化を行なう手続き) 。クライアント側とサーバ側の両方で使われる。
name_svc.c: サーバ側の main 関数とディスパッチ手続き。受け付けた RPC の要求を解析して、開発者が定義した手続きを呼び出す。

これらのファイルの内容は、人間が十分読める。

◆手続きの識別

SunRPCでは、手続きの識別を、次のような情報で行う

ホスト(IPアドレス)。
プログラム番号。32ビット。 16進数で 0x20000000 ～ 0x40000000 の間(10進で536879812～1073741824)は、ユーザが利用できる領域。広く使われるものは、 /etc/rpc にある。
バージョン番号。32ビット。小さい整数。
プロトコル。TCP/IPかUDP/IP。
手続き番号。

◆portmap

図? portmapper の働き

SunRPCでは, 最終的にはTCP/IPまたはUDP/IPでデータが送られる。プログラム番号などTCP/IPまたはUDP/IPのポート番号を対応させる必要がある。

各ホストには portmap というサーバがいて、3つ組

＜プログラム番号,バージョン,プロトコル＞

を、TCP/IPまたはUDP/IPのポート番号へ変換する。

◆rpcinfo

portmap の情報は、 rpcinfo コマンドで表示できる。

% rpcinfo -p 
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100007    2   tcp   1024  ypbind
    100007    2   udp   1027  ypbind
    100007    1   tcp   1024  ypbind
    100007    1   udp   1027  ypbind
    100005    1   udp    831  mountd
    100005    2   udp    831  mountd
    100005    1   tcp    834  mountd
    100005    2   tcp    834  mountd
    100003    2   udp   2049  nfs
    100001    2   udp   4193  rstatd
    100001    3   udp   4193  rstatd
    100001    4   udp   4193  rstatd
%

他のホストの情報も調べられる。

% rpcinfo -p hostname

サーバは、起動時に、portmap に登録する。

pmap_set(prognum, versnum, protocol, port)
u_long prognum, versnum, protocol;
u_short port;

クライアントは、呼び出す前に、ポート番号を調べる。

u_short
pmap_getport(addr, prognum, versnum, protocol)

スタブが自動的に呼び出すので、普段は気にすることはない。

portmap 自身も RPC で動いている。portmap の自身のポート番号は、111 と決まっている。

■ディレクトリ一覧サービス

ディレクトリ一覧サービス(dirlist)。

クライアントは、サーバにディレクトリのパス名（文字列）を送る。
サーバは、ディレクトリの内容（リスト）をクライアントに返す。

◆インタフェース

   1:	struct delist {
   2:	        string del_name<> ;
   3:	        struct delist *del_next ;
   4:	};
   5:	
   6:	union dirlist_res
   7:	switch(int dlr_errno)
   8:	{
   9:	case 0:
  10:	        delist *dlr_head;
  11:	default:
  12:	        void;
  13:	};
  14:	
  15:	program DIRLIST_PROG {
  16:	    version DIRLIST_VERSION {
  17:	        dirlist_res DIRLIST(string) = 11 ;
  18:	    } = 1;
  19:	} = 0x20001003 ;

17行目が手続きの定義。 DIRLIST が手続きの名前(手続き番号を示す定数の定義)。

この手続きの引数は、string 型、結果は、dirlist_res 型。SunRPC では、基本的には引数も結果も１つずつ。複数必要な時には、構造体を定義する。

手続き DIRLIST の定義を program 番号と version 番号の括弧が取り囲んでいる。

dirlist_res は、共用体(可変長の構造体、Pascal 可変長レコード)。

dlr_errno の値が 0 の時: delist 型へのポインタ dlr_head
その他: 何も送らない(dlr_errnoのみ)

struct delist は、自分自身へのポインタ(del_next)を含むリスト構造。

図? struct delist の構造

一般的に RPC では、ポインタを含むデータ構造を送ることはできない。 SunRPC では、ポインタの先の１要素を再帰的に送る機能がある。これにより、木構造や線形リストを送ることができる。

string del_name<> は、文字列型。可変長(<>)で最大文字数には定められていない(<>の中が空)。

◆ヘッダファイル

インタフェース定義から、rpcgen コマンドによりヘッダファイル、クライアント側スタブ、サーバ側スタブ、XDR によるマーシャリングを行うプログラムが生成される。

ヘッダファイルの主要部分は、以下の通りである。

[dirlist.h]

struct delist {
	char *del_name;
	struct delist *del_next;
};
typedef struct delist delist;

struct dirlist_res {
	int dlr_errno;
	union {
		delist *dlr_head;
	} dirlist_res_u;
};
typedef struct dirlist_res dirlist_res;

#define DIRLIST_PROG ((u_long)0x20001003)
#define DIRLIST_VERSION ((u_long)1)

#define DIRLIST ((u_long)11)
extern  dirlist_res * dirlist_1(char **, CLIENT *);
extern  dirlist_res * dirlist_1_svc(char **, struct svc_req *);

◆サーバ側のプログラム

[dirlist_server.c]

:	/*
   2:	        dirlist_server.c -- ディレクトリの内容を表示するRPCのプログラム(サーバ側)
   3:	        Created on: 2006/01/18 20:33:31
   4:	*/
   5:	
   6:	#include <sys/types.h>          /* opendir(2) */
   7:	#include <dirent.h>             /* opendir(2) */
   8:	#include <errno.h>              /* errno */
   9:	#include <stdlib.h>             /* malloc() */
  10:	#include <string.h>             /* strlen() */
  11:	#include <stdio.h>              /* snprintf() */
  12:	
  13:	#include "dirlist.h"
  14:	static struct delist *make_delist( DIR *dirp );
  15:	
  16:	dirlist_res *
  17:	dirlist_1_svc(char **argp, struct svc_req *rqstp)
  18:	{
  19:	    static dirlist_res  result;
  20:	    DIR *dirp ;
  21:	    char *dirname ;
  22:	
  23:	        xdr_free((xdrproc_t)xdr_dirlist_res, (char *)&result);
  24:	
  25:	        dirname = *argp ;
  26:	        dirp = opendir( dirname );
  27:	        if( dirp == 0 )
  28:	        {
  29:	            result.dlr_errno = errno ;
  30:	            return( &result );
  31:	        }
  32:	        result.dlr_errno = 0;
  33:	        result.dirlist_res_u.dlr_head = make_delist( dirp );
  34:	        closedir( dirp );
  35:	        return( &result );
  36:	}
  37:

サーバ側では、手続き dirlist_1_svc() を記述する。引数と結果は、rpcgen のソースで定義した構造体へのポインタ。

結果を返す時の構造体へのポインタを返す方法が問題。自動変数(auto 変数) にすると、呼出し側に戻った瞬間に無効になる。よく使われるのは、静的変数 (static 変数)に結果を代入して、返すことだが、マルチスレッドプログラミングでは大いに問題になる。

opendir(), readdir(), closedir() は、ディレクトリの内容を得るためのライブラリ関数。次のようなコードで、"/" の内容が表示される(ls -f /)。

	dirp = opendir( "/" );
	while( (dp = readdir(dirp)) != NULL )
	{
	    printf("%s\n", dp->d_name );
	}
	closedir( dirp );

readdir() は、make_delist() の中で行われる。

  38:	static struct delist *
  39:	make_delist( DIR *dirp )
  40:	{
  41:	    struct dirent *dp ;
  42:	    struct delist *del ;
  43:	    int namelen;
  44:	        if( (dp = readdir(dirp)) == NULL )
  45:	        {
  46:	            return( 0 );
  47:	        }
  48:	        else
  49:	        {
  50:	            del = malloc(sizeof(struct delist));
  51:	            namelen = strlen( dp->d_name );
  52:	            del->del_name = malloc( namelen+1 );
  53:	            snprintf(del->del_name,namelen+1,"%s",dp->d_name );
  54:	            del->del_next = make_delist( dirp );
  55:	            return( del );
  56:	        }
  57:	}

make_delist() は、再帰呼び出しで、struct delist のリストを作る。

自動生成されるサーバのmainプログラムでは、ポートマッパ(port mapper) へのプログラム番号とバージョン番号の登録される。

◆クライアント側のプログラム

[dirlist_client.c]

   1:	/*
   2:	        dirlist_client.c -- ディレクトリの内容を表示するRPCのプログラム(クライアント側)
   3:	        Created on: 2006/01/18 20:57:49
   4:	*/
   5:	
   6:	#include <stdlib.h>             /* exit() */
   7:	#include <stdio.h>              /* printf() */
   8:	
   9:	#include "dirlist.h"
  10:	
  11:	main( int argc, char *argv[] )
  12:	{
  13:	        if( argc != 3 )
  14:	        {
  15:	            fprintf(stderr,"usage: %% %s server_host dir\n", argv[0]);
  16:	            exit( 1 );
  17:	        }
  18:	        dirlist( argv[1], argv[2] );
  19:	}
  20:	
  21:	dirlist(char *host, char *dir)
  22:	{
  23:	        CLIENT *clnt;
  24:	        dirlist_res  *result;
  25:	        char *arg;
  26:	
  27:	        clnt = clnt_create (host, DIRLIST_PROG, DIRLIST_VERSION, "tcp");
  28:	        if( clnt == NULL )
  29:	        {
  30:	            clnt_pcreateerror( host );
  31:	            exit( 1 );
  32:	        }
  33:	
  34:	        arg = dir;
  35:	        result = dirlist_1( &arg, clnt );
  36:	        if( result == NULL )
  37:	        {
  38:	            clnt_perror( clnt, "call failed");
  39:	            exit( 1 );
  40:	        }
  41:	        print_dirlist_res( result );
  42:	        xdr_free( (xdrproc_t)xdr_dirlist_res, (char *)result );
  43:	        clnt_destroy( clnt );
  44:	}
  45:

main() の引数は、サーバが動作しているホスト名とディレクトリ。

clnt_create() は、CLIENT 構造体を確保する。引数は、サーバのホスト名、 プログラム番号、 バージョン番号、通信に使うプロトコルである。

33行目で呼び出している dirlist_1() が、rpcgen により生成されたスタブ。引数は、インタフェースで定義された構造体 dirlistargへのポインタと、CLIENT 構造体へのポインタ。結果は、インタフェースで定義された構造体 dirlistresへのポインタである。この関数は、rpcgen が生成する *_clnt.c というファイルに含まれてる。

スタブ dirlist_1()は、成功すると内部で malloc() を呼び出しメモリを確保して、そこにサーバから受け取った応答メッセージを 非整列化(unmarshalling) して保存する。このメモリは、利用し終わると xdr_free()で解放する。

	free( result );

だと、トップレベルの構造体のメモリしか解放されない。内部に含まれている構造体へのポインタや文字列のメモリを解放するためには、 xdr_free()を呼び出す。

  46:	print_dirlist_res( dirlist_res *result )
  47:	{
  48:	        printf("errno: %d (%s)\n",
  49:	               result->dlr_errno, strerror(result->dlr_errno));
  50:	        switch( result->dlr_errno )
  51:	        {
  52:	        case 0:
  53:	            print_delist( result->dirlist_res_u.dlr_head );
  54:	            break;
  55:	        default:
  56:	            break;
  57:	        }
  58:	}
  59:	
  60:	print_delist( struct delist *del )
  61:	{
  62:	        if( del == NULL )
  63:	        {
  64:	        }
  65:	        else
  66:	        {
  67:	            printf("%s\n",del->del_name );
  68:	            print_delist( del->del_next );
  69:	        }
  70:	}

print_dirlist_res() は、struct dirlist_res を表示する。 print_delist() は、struct delist を表示する。構造体が再帰している所で、関数も再帰している。

◆コンパイル

% wget http://www.coins.tsukuba.ac.jp/~yas/coins/dsys-2006/2007-01-30/ex/dirlist.x 
% wget http://www.coins.tsukuba.ac.jp/~yas/coins/dsys-2006/2007-01-30/ex/dirlist_server.c 
% wget http://www.coins.tsukuba.ac.jp/~yas/coins/dsys-2006/2007-01-30/ex/dirlist_client.c 
% wget http://www.coins.tsukuba.ac.jp/~yas/coins/dsys-2006/2007-01-30/ex/Makefile 
% emacs Makefile 
% make 
rpcgen dirlist.x
gcc -g -DDEBUG    -c -o dirlist_clnt.o dirlist_clnt.c
gcc -g -DDEBUG    -c -o dirlist_client.o dirlist_client.c
gcc -g -DDEBUG    -c -o dirlist_xdr.o dirlist_xdr.c
gcc -g -DDEBUG     -o dirlist_client  dirlist_clnt.o dirlist_client.o dirlist_xdr.o -lnsl 
gcc -g -DDEBUG    -c -o dirlist_svc.o dirlist_svc.c
gcc -g -DDEBUG    -c -o dirlist_server.o dirlist_server.c
gcc -g -DDEBUG     -o dirlist_server  dirlist_svc.o dirlist_server.o dirlist_xdr.o -lnsl
%

Solaris, Linux では、-lnsl が必要。Solaris, MacOSX では、-DDEBUG をつけてコンパイルとfork() しないでフォアグランドで動くサーバができる。

◆サーバ側の実行

実行する時には、ウインドウを２つ開いて、それぞれでサーバとクライアントを走らせる。まずサーバ側から先に実行する。

% ./dirlist_server

サーバ側は自動的には止まらないので、止めたい時には、^C を押す。

サーバを動かしているホストの別の端末で rpcinfo -p を実行すると、プログラム番号536875011 (0x20001003) のプログラムが表示される。

% rpcinfo -p 
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp   1013  status
    100024    1   tcp    974  status
    100021    0   udp   1000  nlockmgr
    100021    1   udp   1000  nlockmgr
    100021    3   udp   1000  nlockmgr
    100021    4   udp   1000  nlockmgr
    100021    0   tcp    973  nlockmgr
    100021    1   tcp    973  nlockmgr
    100021    3   tcp    973  nlockmgr
    100021    4   tcp    973  nlockmgr
 536875011    1   udp  53972
 536875011    1   tcp  60723
%

MacOSX 10.4 では、portmapp が動作していないことがある。rpcinfo -p で何も表示されない時には、root で次のようにして実行する。

launchctl start com.apple.portmap

◆クライアント側の実行

サーバと同じホストで動かす時には、ホスト名には、localhost を使うと便利。

% ./dirlist_client localhost /usr 
errno: 0 (Unknown error: 0)
.
..
.DS_Store
bin
epkg
include
info
lib
libexec
local
local3
sbin
share
standalone
X11R6
%

■SunRPC で使えるデータ型

RPCのインタフェースで使えるデータ構造は、Ｃ言語とほぼ同じであるが、可変長のデータが扱えるよう拡張されてる部分や、制限されている部分がある。

基本型

整数: char, short, int (32ビット), long, long long (hyper, 64ビット)

各 unsigned。

浮動小数点: float, double, (quadruple 4倍精度)

列挙型

ほぼＣ言語と同等に書ける。自動的に typedef される。

構造体

ほぼＣ言語と同等に書れる。自動的に typedef される。

定数

RPCのインタフェースの定義では、定数を記述することができる。

const LSIZE = 80 ;

これは、rpcgen により、Ｃ言語のプリプロセッサのマクロ #defineに置き換えられる。

配列

RPCのインタフェースの定義では、配列を定義することができる。配列には、固定長と可変長の２種類あり、固定長は、Ｃ言語と同じ。可変長は、次のように[]の代わりに<>とする。

int varray<>;

<> の中には、最大長を書くこともできる。これは、 rpcgen により、次のような構造体に置き換えられる。

struct {
	u_int varray_len;
	int *varray_val;
} varray;
typedef struct varray varray ;

Ｃ言語でプログラムを作成する時には、この varray_len に要素数を、 varray_val に、配列の先頭のポインタをセットする。


int a1[10] ;

varray v1 ;
	v1.varray_len = 10 ;
	v1.varray_val = &a1[0] ;

文字列

文字列を送るには、特殊な string 型を使う。

string s<> ;

これは、Ｃ言語では、char * になるが、文字列のつもりで次のように書いても、文字列は送られない。

char *s ;

これも、rpcgen により、やはり char * にコンパイルされるが、この場合、ポインタの先の１文字しか送られない。Ｃ言語の文字列を送りたい時には、必ず string を使うこと。

ポインタ

SunRPC ではポインタ型も送ることができるが、ポインタの先の１要素しか送られない。複数要素を送りいた時には、配列を使う。

argv

Ｃ言語の main の引数と同様の構造を送りたい時には、次のようにする。

typedef string argstr_t<>; 
typedef argstr_t argvt<>;

opaque(不定形)

大量のデータをそのまま送るには、

opaque

型 ( 不定形型 ) を使う。これには固定長と可変長がある。

opaque fileblock[512] ;
opaque filedata<> ;

opaque の代わりにchar の配列にすると、文字と見なして１文字１文字変換が行なわれることになり、非常に遅くなる。場合によっては文字コードの変換が行なわれる。opaque では、そのような変換は一切行なわれず、そのままの形で送られる。

union共用体

RPCのインタフェースの定義では、共用体(可変長の構造体)が書ける。

union int_result_t
switch( int status )
{
case OK:
	int	data ;
default:
	void;
};

これは、次のようにコンパイルされる。

struct int_result_t {
        int status;
        union {
                int data;
        } int_result_t_u;
};

status という値がOK の時だけdata が有効になる。すなわち、その時だけ実際にネットワークにたいしてdata の部分が送られる。

bool_t

RPCのインタフェースの定義では、bool_t という型が使える。値は、 TRUE か FALSE 。

XDRによる構造体のファイルへの保存

Sunの技術で、構造体をファイルに保存することができる。

構造体を整列化し、他のプロセスに通信メッセージとして送る代わりにファイルに保存する。

SunRPCでは、xdrstdio_create() という関数が用意されている。ストリーム入出力(FILE *) に対して構造体を読み書きすることができるようになる。

RPCと同様に、構造体にポインタが含まれていた場合、再帰的にファイルに保存される。異なる機種で読み出すことができる。

■NFS(Network File System)

NFS ( Network File System ) は, Sun Microsystems 社が開発したネットワーク・ファイル・システムの名前(固有名詞, 商標)。

ネットワーク・ファイル・システム(一般名詞): ネットワークを通じて他のコンピュータ上にあるファイルをあたかも自分自身のローカルディスクにあるファイルと同じように扱えるようにしたファイルシステム
分散ファイルシステム: ネットワーク・ファイル・システムが発展して分散透明性が実現されたもの。

NFS は、Unix 系の OS (MacOSX 含む) では、事実上の標準。

その他のネットワーク・ファイル・システム

SMB/CIFS。
AFP (Apple File Protocol)
WebDAV
AT&T RFS (Remote File System)
AFS (Andrew File System)

◆NFSの機能

NFSを使うと, ネットワークを通じて別のコンピュータ上のファイルシステムの一部分を, ローカルディスク上にあるファイルシステムと同じように, 自分のファイルシステムの木に マウント(mount) できる。

図? NFSによるファイルの共有

相互に参照し合える。

NFSの仕組み

◆NFSの遠隔手続き呼び出し

NFSは、 SunRPC を利用している。データの転送には, UDP/IP または、TCP/IP (NFS v3以降) を利用している。

RPCのサーバ: ディスクを持っているホストのカーネル
RPCのクライアント: 参照しているホストのカーネル

サーバ側で nfsd というプログラムが動いているように見える( ps コマンドで表示される) が、それは普通のプロセスではない。

/usr/include/rpcsvc/nfs_prot.x

表? NFSで使われているRPCの手続き

手続き名	意味	関連するコマンド、システムコール
null()	何もしない	`rpcinfo -u hostname nfs` コマンド
getattr()	属性の読み出し	`ls -l` コマンド, `stat` システムコール , `open` システムコール
setattr()	属性の設定	`chmod` , `chown` コマンド
lookup()	ファイルの検索	`open` システムコール
readlink()	シンボリックリンクの読み出し	`ls -l` コマンド, `readlink` システムコール
read()	ファイルの読み出し	`read` システムコール
write()	ファイルの書き込み	`write` システムコール
create()	ファイルの作成	`creat` システムコール, `open` システムコール
remove()	ハードリンクの削除	`rm` コマンド, `unlink` システムコール
rename()	ファイル名前の変更	`mv` コマンド, `rename` システムコール
link()	ハードリンクの作成	`ln` コマンド, `link` システムコール
symlink()	シンボリックリンクの作成	`ln -s` コマンド, `symlink` システムコール
mkdir()	ディレクトリの作成	`mkdir` コマンド
rmdir()	ディレクトリの削除	`rmdir` コマンド
readdir()	ディレクトリの読み出し	`ls` コマンド
statfs()	ファイルシステムの利用状況	`df` コマンド, `statfs` システムコール
commit()*	ディスクへの書き込み	`fsync` システムコール
access()*	アクセス権のチェック	`access` システムコール

* は、NFS v3 の新しい手続き。

◆NFSファイルハンドル

NFS でファイルやディレクトリを区別するための識別子。32バイト。

const NFS_FHSIZE	= 32;
...
/*
 * File access handle
 */
struct nfs_fh {
	opaque data[NFS_FHSIZE];
};

一番最初のNFSファイル・ハンドルをどうやって入手するか。

◆NFSマウントのためのRPCプログラム

一番 NFS v2, NFS v3 では、NFS 本体とは別にディレクトリ木のルートを得るためのRPC のプログラム(MOUNTPROG)がある。 /usr/include/rpcsvc/mount.x

手続き名	意味	関連するコマンド、システムコール
null()	何もしない	`rpcinfo -u hostname mount` コマンド
mnt()	NFSファイルハンドルを返す	`mount` コマンド
dump()	マウント一覧表	`showmount hostname` コマンド
umnt()	アンマウント	`umount` コマンド
umntall()	全アンマウント	`umount -h hostname` コマンド
export()	アクセス可能なディレクトリのリストを返す

◆lookup()


2.2.5.  Look Up File Name

	diropres
	NFSPROC_LOOKUP(diropargs) = 4;

If the reply "status" is NFS_OK, then the reply "file" and reply
"attributes" are the file handle and attributes for the file "name"
in the directory given by "dir" in the argument.

2.3.10.  diropargs

    struct diropargs {
	fhandle  dir;
	filename name;
    };

The "diropargs" structure is used in directory operations.  The
"fhandle" "dir" is the directory in which to find the file "name".
A directory operation is one in which the directory is affected.

2.3.11.  diropres

    union diropres switch (stat status) {
    case NFS_OK:
	struct {
	    fhandle file;
	    fattr   attributes;
	} diropok;
    default:
	void;
    };

The results of a directory operation are returned in a "diropres"
structure.  If the call succeeded, a new file handle "file" and
the "attributes" associated with that file are returned along with
the "status".

◆read()

2.2.7.  Read From File

	struct readargs {
		fhandle file;
		unsigned offset;
		unsigned count;
		unsigned totalcount;
	};

	union readres switch (stat status) {
	case NFS_OK:
		fattr attributes;
		nfsdata data;
	default:
		void;
	};

	readres
	NFSPROC_READ(readargs) = 6;

Returns up to "count" bytes of "data" from the file given by "file",
starting at "offset" bytes from the beginning of the file.  The first
byte of the file is at offset zero.  The file attributes after the
read takes place are returned in "attributes".

Notes:  The argument "totalcount" is unused, and is removed in the
next protocol revision.

◆write()

2.2.9.  Write to File

	struct writeargs {
		fhandle file;
		unsigned beginoffset;
		unsigned offset;
		unsigned totalcount;
		nfsdata data;
	};

	attrstat
	NFSPROC_WRITE(writeargs) = 8;

Writes "data" beginning "offset" bytes from the beginning of "file".
The first byte of the file is at offset zero.  If the reply "status"
is NFS_OK, then the reply "attributes" contains the attributes of the
file after the write has completed.  The write operation is atomic.
Data from this "WRITE" will not be mixed with data from another
client's "WRITE".

Notes:  The arguments "beginoffset" and "totalcount" are ignored and
are removed in the next protocol revision.

◆getattr()

2.2.2.  Get File Attributes

	attrstat
	NFSPROC_GETATTR (fhandle) = 1;

If the reply status is NFS_OK, then the reply attributes contains the
attributes for the file given by the input fhandle.

2.3.9.  attrstat

    union attrstat switch (stat status) {
    case NFS_OK:
	fattr attributes;
    default:
	void;
    };

The "attrstat" structure is a common procedure result.  It
contains a "status" and, if the call succeeded, it also contains
the attributes of the file on which the operation was done.

2.3.5.  fattr

    struct fattr {
	ftype        type;
	unsigned int mode;
	unsigned int nlink;
	unsigned int uid;
	unsigned int gid;
	unsigned int size;
	unsigned int blocksize;
	unsigned int rdev;
	unsigned int blocks;
	unsigned int fsid;
	unsigned int fileid;
	timeval      atime;
	timeval      mtime;
	timeval      ctime;
    };

...
Notes:  The bits are the same as the mode bits returned by the
stat(2) system call in UNIX.  The file type is specified both in
the mode bits and in the file type.  This is fixed in future
versions.

The "rdev" field in the attributes structure is an operating
system specific device specifier.  It will be removed and
generalized in the next revision of the protocol.

◆冪等な操作

冪等(idempotent)な操作とは、その操作を何回繰り返しても、1回だけ実行した時と同じ結果になるもの。

例：

足し算。関数でもある(値の書換えがない、引数だけで結果が決まる)。
位置を指定したファイルの読み込み(Unix pread()システムコール)
位置を指定したファイルの書き込み(Unix pwrite()システムコール)

冪等ではない操作

変数を１つ増やす
位置を指定しないファイルの読み込み(Unix read() システムコール)
位置を指定しないファイルの書き込み(Unix write() システムコール)

ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
ssize_t pread(int fd, void *buf, size_t count, off_t offset);
ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset);

◆状態を保持しないサーバ

NFS Version 2は, サーバ側で状態を保持しない(state-less)。

サーバ側でファイルを読み書きするための「位置」のような変数を保持しない。 (クライアント側で保持する。)
write() では、必ずディスクに書き込んでから応答を返す。

操作を冪等にして、サーバを無状態にすると、障害に強くなる。

クライアントが落ちても、サーバ側に開いたままのファイルが残らない。
サーバが落ちた場合、要求を送り続けることで回復できる。

サーバを無状態にすると書き込みが重くなる。 Linux は、NFS Version 2 の約束を守っていない。

◆cookie

RPC のようにコネクションが作られない通信サービスを使う時に冪等や無状態といった性質を実現する時に必要になる技術。

例：NFSでのディレクトリの読み込み手続き nfsproc_readdir() で、１回の RPC で全部のデータを返せないことが起きる。ディレクトリのどの位置まで読み込んだかを示す中間状態を クッキー(cookie) という形でクライアントに返す。

クライアントは、次の RPC の呼び出しで、前回受けとった応答の中のクッキーを、サーバへの要求に含めて送す。

◆readdir()

const NFS_COOKIESIZE	= 4;
typedef opaque nfscookie[NFS_COOKIESIZE];

2.2.17.  Read From Directory

	 struct readdirargs {
		 fhandle dir;
		 nfscookie cookie;
		 unsigned count;
	 };

	 struct entry {
		 unsigned fileid;
		 filename name;
		 nfscookie cookie;
		 entry *nextentry;
	 };

	 union readdirres switch (stat status) {
	 case NFS_OK:
		 struct {
			 entry *entries;
			 bool eof;
		 } readdirok;
	 default:
		 void;
	 };

	 readdirres
	 NFSPROC_READDIR (readdirargs) = 16;

 Returns a variable number of directory entries, with a total size of
 up to "count" bytes, from the directory given by "dir".  If the
 returned value of "status" is NFS_OK, then it is followed by a
 variable number of "entry"s.  Each "entry" contains a "fileid" which
 consists of a unique number to identify the file within a filesystem,
 the "name" of the file, and a "cookie" which is an opaque pointer to
 the next entry in the directory.  The cookie is used in the next
 READDIR call to get more entries starting at a given point in the
 directory.  The special cookie zero (all bits zero) can be used to
 get the entries starting at the beginning of the directory.  The
 "fileid" field should be the same number as the "fileid" in the the
 attributes of the file.  (See section "2.3.5. fattr" under "Basic
 Data Types".)  The "eof" flag has a value of TRUE if there are no
 more entries in the directory.

nfsproc_readdir() で、１回目と２回目の RPC の間にディレクトリの内容が更新された場合、どのような結果になるのか不明。

◆NFS非同期入出力デーモン

NFS非同期入出力デーモン ( nfsiod (local NFS asynchronous I/O Daemon) または biod (asynchronous Block I/O Daemon) ) は、NFSのクライアントホスト上で動き、NFSの非同期的な入出力を行う。

書き込みの高速化
ブロックの先読み

◆lockdとrstatd

NFS v2, NFS v3 には、ロックの機能が元々存在しない

サーバが落ちたら、ロックが消えるかもしれない
クライアントが落ちたら、ロックを保持したままになるかもしれない
ネットワークが切れたら、、、

後でロックの機能を付加した。

lockd: クライアント・ホストとサーバ・ホストの両方で走る。クライアント・ホスト上の lockd は、受け取った要求をサーバ上の lockd に転送する。サーバ・ホスト上の lockd は、クライアントの lockd から受け取った要求をシステム・コールでカーネルに伝える。
statd: サーバ・ホスト上の lockd は、クライアント・ホスト上の statd に問い合わせて、クライアント・ホストがクラッシュしていないか調べる。クライアント・ホストがクラッシュしたら、そのクライアントから受け取っていたロック要求を解除する。

lockd には、当初からかなりバグが多かった。初期の statd には、バッファ・オーバーフローのバグがあった。

◆NFS Version 3

無状態サーバが見直し。
commit() という手続きが追加。それまでに行われた書き込みをディスクに行うように指示できる。
UDP/IP に加えて TCP/IP でもアクセス可能。大きなブロックで読み書きできる。
個々の手続きの結果の中で、ファイルの属性(最終更新時刻を含む)を返す。 getattr() 手続きの呼出し回数。
ファイルのオフセットが 32 ビットから 64 ビットに変更。4GB 以上の大きさが扱える。
access() という手続きによるアクセス権のチェック。

◆NFS Version 4

2003年

compound() 手続きの導入。通信の遅延が大きいネットワークでも効率よく動作させるために、複数の NFS の手続きを 1 回の遠隔手続き呼出しの中でまとめて実行する。

open() や close() という手続きの導入

ロックの機能の本体への取り込み

マウント機能の本体への取り込み

RPCSEC_GSS という仕組みによる利用者認証の機能。主に Kerberos 対応。

◆CIFS

CIFS は、Microsoft Windows で用いられているネットワーク・ファイル・システム。

昔の名前は、 SMB プロトコル。 Samba は、 CIFS/SMB を、Unix 系のオペレーティング・システムで実現したプログラム。

Samba の利用法

Windows 間でのファイルの共有
Unix と Windows のでのファイルの共有

■練習問題

練習問題(1) NFS mountクライアント

/usr/include/rpcsvc/mount.x にある手続きのうち、次のどれかを実行するクライアント側のプログラムを書きなさい。

MOUNTPROC_DUMP
MOUNTPROC_EXPORT

サーバとしては、orchid-fs1 、または、orchid-fs2 を用いなさい。

練習問題(2) NFSクライアント

/usr/include/rpcsvc/nfs_prot.x にある手続きのうち、次のどれかを実行するクライアント側のプログラムを書きなさい。

NFSPROC_GETATTR
NFSPROC_READLINK
NFSPROC_READ
NFSPROC_WRITE
NFSPROC_CREATE
NFSPROC_REMOVE
NFSPROC_RENAME
NFSPROC_LINK
NFSPROC_SYMLINK
NFSPROC_MKDIR
NFSPROC_READDIR
NFSPROC_STATFS
同等の NFSv3 の手続き

サーバとしては、coins 環境では orchid-fs1 を用いなさい。その場合、この実験では、ディレクトリのファイル・ハンドルとして、次のどれかを使ってもよい。

~yas/coins/dsys-2006/nfs/fh/dir1.fh
~yas/coins/dsys-2006/nfs/fh/dir2.fh
~yas/coins/dsys-2006/nfs/fh/dir3.fh
~yas/coins/dsys-2006/nfs/fh/file1.fh
~yas/coins/dsys-2006/nfs/fh/file2.fh
~yas/coins/dsys-2006/nfs/fh/file3.fh

NFS サーバにアクセスする時には、CLIENT 構造体の cl_auth フィールドを次のように初期化しなさい。

        cl = clnt_create(server, NFS_PROGRAM, NFS_VERSION, "udp");
        cl->cl_auth = authunix_create_default();

NFSPROC_LOOKUP を呼び出すプログラムが次のファイルにある。これを参考にするとよい。

~yas/public_html/coins/dsys-2006/2007-01-30/nfs/lookup.c

練習問題(3) dirlistlongサービス

ディレクトリ一覧サービス(dirlist)を拡張して、ls -l の用に、型、モード、サイズ、最終更新時刻などを表示できるようにしなさい。

ヒント：stat() システムコールや lstat() システムコールでファイルの属性を得る。RPC では、単に string を返すのではなく、属性も返す。返す属性は、自分で .x ファイルに定義して、struct stat からコピーする。

struct stat にある属性のうち、数個を送ればよい。全てを送らなくてもよい。

練習問題(4) getenv/putenv

getenv() とputenv()/setenv() を使って、RPCサーバの環境変数を読み書きするようなプログラムを作りなさい。

注意：putenv() の引数は、strdup() すること。(同じ名前の環境変数があると、ゴミが出てきてしまうが、この課題では問題ないとする。)

練習問題(5) ハッシュ表

次のようなインタフェースを持つハッシュ表を RPC で実現しなさい。

[hashtable.x]

typedef string key_t<256>;
struct keyvalue_t { 
   key_t key; 
   int   value ;
};
typedef key_t  keyarray_t<>;

program HASHTABLE_PROG { 
   version HASHTABLE_VERSION {
       int        PUT(keyvalue_t)  = 11 ; 
       int        GETVALUE(key_t)  = 12 ; 
       keyarray_t GETKEYS(void)    = 13 ; 
   } = 1 ;
} = 0x20051001 ;

練習問題(6) カウンタ

次のような手続きを持つカウンタを RPC サーバとして実現しなさい。

void up();
int getValue();
void reset(int newVal);

練習問題(9) SunRPC自由課題

その他、上記の練習問題と同程度以上の複雑さを持つSunRPC のサーバ、および、クライアントを作成しなさい。自分自身が今までに作成したプログラムを、 RPC で実行可能にしてもよい。プログラムの中には、RPC にするものに適していないものがある。この課題を選択する時には、事前に教官に相談しなさい。

■課題

上の練習問題のうち、１つを選択して提出しなさい。締切りは 2007年2月7日水曜日 23:59:59 とする。 (注意：再提出になることがある。)

◆レポートの提出方法

まず、次のような「テキストファイル」を作成する。漢字コードとしては、 JIS、Shift-JIS、EUC を受け付ける。（PDF, RTF, ワープロの文書ファイルでは受付ない。テキストでも Unicode (UTF) は、受け付けない。ZIP や tar, gzip 等で固めたり圧縮しないように。）

----------------------------------------------------------------------
学籍番号: 200404321
名前: 漢字の名前
課題番号：M
練習問題番号：N
題名：subject

＜内容＞
----------------------------------------------------------------------

本文の先頭に学籍番号と名前（漢字の名前がある人は、漢字で）を書きなさい。課題番号としては、「1」と記述しなさい。練習問題番号とは「★練習問題」に続いて表示されている番号である。題名には、電子メールの Subject: と同様に、内容に即したものをつける。

＜内容＞は、日本語（または英語）で書きなさい。文章には、述語を付ける。体言止めは、使ってはならない。単にプログラムを含めるのではなく、「以下に○○のプログラムを示す」と書くこと。＜内容＞には、プログラムだけでなく、実行結果（入力と出力）と説明をつける。

問題を難しい方に変えてた場合、または、最初から難しい問題を解いた場合には、＜内容＞の部分で主張しなさい。

作成したファイルを、次のページから投稿する。

分散システム/ レポート提出ページ

上で書いた課題番号、題名を繰り返し指定する。さらに、WWW ブラウザの機能を使って作成したレポートのファイルを選択する。最後に、「提出」ボタンを押す。

提出されたレポートは、次のボタンで表示できます。

もし、提出に失敗したり、提出には成功しても確認画面に現れない場合には、新城(yas@is.tsukuba.ac.jp)かＴＡに、連絡しなさい。

レポートを再提出する時には、どの部分を修正したのかが簡単にわかるように説明しなさい。

提出したレポートは、講義が終るまで保存しなさい。

Last updated: 2007/02/06 19:30:30

Yasushi Shinjo / <yas@is.tsukuba.ac.jp>