Prelude:
I am trying to construct a web site. As a software developer, he worries about server crash most. Even
the server can be restarted, some resources get lost. This incurs one need: can there be a backup copy for recovery
of the crashed server program as same as the recovery from suffering a data loss through data backup ? I ever read
about "passing file descriptor" in Richard Stevens' Advanced Programming in the UNIX Environment. This powerful
ability, which is like 'dup/dup2', but between processes, has some meaning of generating a 'backup copy'. And this
leads me to an idea to use it to realize "process backup". Of course, it is not a process image backup, but to back
up file descriptors in a process, including the networking sockets I wish to handle.
I wrote a small program to try this idea and succeeded. Need to test it in server program for large-scale
service.
The idea of below test is:
(1)Simulating two client processes "chatting";
S
/ \
C1 C2
(2)Conduct "process backup" for the Server;
S -> B
(3)Master issues an instruction and the original Server "crashes";
(4)Crash being detected, start a new process listening on the original port
(e.g. websocket chatting port);
(5)Get those resouces( such as fd and chatting relationship) ready for the new process;
(6)To see if the two clients can continue to chat or not.
The source code:
server_crash_backup-restore_test_2.tgz for download
This test code has not been cleaned up yet, :-( .
After this .tgz file is uncompressed, the directory structure is as below:
The directories reuse_test/, pass_fd_and_struct_test/ can be deleted. It's better to keep tmp/ ,
the use of which will be seen below.
Four programs are concerned: master( server), (data )server, (data )client, backup_process ,
each of which uses a seperate directory.
We will use two data client processes. Here it means two users (data clients) are communicating
through one machine (data server) , worrying about the possible crash of the server, "back it up" to the
backup_process. The master is for issuing instructions. Each instruction issued will be sent to every slave,
who must select from the instructions those it needs and behave accordingly.
All other programs beyond the master are slave/clients of the master, (data )client
is the client of the (data )server. The data server is the client of backup_process.
First, enter into each directory and run make to compile.
Since the backup process and the data server communicate through (named) UNIX domain socket here,
I must specify the path, in backup_process/client.cpp, modify
const char* path="/usr/working3/server_crash_backup-restore_test/tmp/tmp.sock"
as the one you specify. I did test on Baidu BCC and it was modified as
const char* path="/usr/working/test/server_crash_backup-restore_test_2/tmp/tmp.sock" .
Do the same in server_test/client.cpp.
And then recompile.
Starting master:
(cd master-server_test ;)
./server_crash_test-master 5000 3
The parameter 3 indicates there will be 3 slaves at the very beginning,
two data clients, one data server.
Starting (data )server:
(cd server_test/ ;)
./server_test 5001
Starting clients:
(cd client_test;)
./client_test 127.0.0.1 5001 c1 (client no. 1)
./client_test 127.0.0.1 5001 c2 (client no. 2)
Now the master prompts in its interface it is ready for instruction input. (All the instructions
are issued by the master and the case of the letter is irrelevant.)
>
Input some 1to2 / 2to1 instructions (simulating client 1 and client 2 are chatting). The client side
that got the message will display the order no. of the message it received( #1,#2,etc.).(You may need to press
the 'Enter' key to get the prompt before input the next instruction.)
Now start up the backup process,
>create
It should display backup process started OK and that it has connected to the master. Now the master
got one more slave.
Do the backup,
>backup
Then we come to look at server crash,
>crash
Watch the server interface, there should be segmentation fault (core dumped).
>restart Restart the server (with parameter restart: this time the data server do no 'accept')
There should be new data server started OK. With ps command one would see server process re-emerges.
>restore
Restores it.
Input 1to2 / 2to1 instructions again under the > prompt, one would see the two clients can still
send and receive data between them. It seems like server crash has never happened.
Quit from the test:
>exit
Below I illustrate the above process and instructions used in detail orderly:
Start the Master, it will have below 3 slaves;
Start the Server, accept 2 connections;
Start Client 1(abbreviated as C1)with parameter C1, connect to the Server, the Server constructs data structure;
Start Client 2(abbreviated as C2)with parameter C2, connect to the Server, the Server constructs data structure;
server_test\server.cpp:
data_sock = server( service);
data_sock2 = server( service);
fd_info[0].fd = data_sock;
fd_info[0].owner = 1;
fd_info[0].target = 2;
fd_info[0].paired = 1;
fd_info[0].fd_orig = data_sock;
fd_info[1].fd = data_sock2;
fd_info[1].owner = 2;
fd_info[1].target = 1;
fd_info[1].paired = 1;
fd_info[1].fd_orig = data_sock2;
1TO2/2TO1 are for instructing Server and Client( target of this instruction):
1TO2 : Server gets ready to receive from C1; C1 sends message to Server; Server looks for C2;
Server sends message to C2; C2 gets ready; C2 receives the message;
2TO1 : Server gets ready to receive from C2; C2 sends message to Server; Server looks for C1;
Server sends message to C1; C1 gets ready; C1 receives the message;
Here it is some common socket communication code.
CREATE : target of this instruction: the Master itself;
(The Master) starts up the Backup process.
master-server_test\server.cpp:
if(!strcasecmp(buf, "CREATE" ))
{
int pid = fork();
......
if(pid == 0)
{
execl("../backup_process/backup_process", "backup_process", NULL);
......
}
}
BACKUP : targets of this instruction: Server process and Backup process.
Server sends structure data to Backup process; (the latter processes the message)
The code will be talked about later.
CRASH : target of this instruction: Server.
RESTART: target of this instruction: the Master itself.
(the Master) starts up a new Server; (with parameter "restart": so not to enter into the code
run by the old Server that "accept"s connections)
master-server_test\server.cpp:
if(!strcasecmp(buf, "RESTART" ))
{
int pid = fork();
......
if(pid == 0)
{
......
execl("../server_test/server_test", "server_test", "2000","restart", NULL);
......
}
}
server_test\server.cpp:
if (!strcasecmp(argv[argc - 1], "restart"))
;
else
......
RESTORE : targets of this instruction: Backup process and the new Server.
Backup process sends structure data to the new Server; (the latter processes the message)
The code will be talked about later.
1TO2/2TO1 : As the above. Continue to watch if the chat goes as normally.
(One can see in the code another two instructions CTOS/STOC, they are similar to 1TO2/2TO1 and are used
in the old version, here they are irrelevant.)
Passing
fd only is not enough, for you cannot discern those among them. Luckily we can pass structure messages using
sendmsg/recvmsg.
The information that
data server backed up to
backup_process is a structure including file descriptor
fd, in both
server_test/ and
backup_process/ there are:
pass_fd_struct_rw.h
typedef struct{
SOCKET fd; //socket fd
int owner; //client id
int target; //chating peer
int paired; //is it a chat pair?
int fd_orig; //some original information
}fd_s;
extern fd_s fd_info[2];
Now we come to look at BACKUP and RESTORE:
BACKUP:
(data )Server side:
server_test\client.cpp:
if(!strcasecmp(buf, "BACKUP"))
{
......
int ret = to_backup(data_sock );
if(ret < 0)
printf("backup 1 error!\n");
sleep(3);
ret = 0;
ret = to_backup(data_sock2);
if(ret < 0)
printf("backup 2 error!\n");
}
server_test\client.cpp:
int to_backup(SOCKET fd_to_backup)
{
......
ret = write_fd_struct(sockfd, (fd_to_backup == fd_info[0].fd)? &fd_info[0] : &fd_info[1] , \
sizeof(fd_s) );
......
}
Backup process side:
backup_process\client.cpp:
if(!strcasecmp(buf, "BACKUP"))
{
......
for(int i = 0; i < 2; i++)
{
int ret = do_backup();
if(ret < 0)
fprintf(stderr, "do_backup() %d return FAIL.\n", i+1);
else
printf("got passed fd seems OK.\n");
if(i == 0)
fd_backup = ret;
else
fd_backup2 = ret;
}
}
backup_process\client.cpp:
SOCKET do_backup()
{
......
static int count = 0;
ret = read_fd_struct(fdaccept, &fd_info[count], sizeof(fd_s));
++count;
......
return ret;
}
The two functions for read/write are implemented in
pass_fd_struct_rw.cpp. They lie in both the
Server and the
Backup program
and are the same. The code is that downloaded from the web(see the URLs referenced below) with a few modifications.
Here one can refer to the unit test code for read/write under the sub-directory
pass_fd_and_struct_test/, and
pass_fd_rw.c
under either
server_test/ or
backup_process/.
pass_fd_struct_rw.cpp:
#include "pass_fd_struct_rw.h"
int write_fd_struct(int sock, fd_s* data, int size)
{
msghdr msg;
// init msg_control
if(data->fd == -1){
msg.msg_control = NULL;
msg.msg_controllen = 0;
}
else{
union {
struct cmsghdr cm;
char space[CMSG_SPACE(sizeof(int))];
} cmsg;
memset(&cmsg, 0, sizeof(cmsg));
cmsg.cm.cmsg_level = SOL_SOCKET;
cmsg.cm.cmsg_type = SCM_RIGHTS; // we are sending fd.
cmsg.cm.cmsg_len = CMSG_LEN(sizeof(int));
msg.msg_control = (cmsghdr*)&cmsg;
msg.msg_controllen = sizeof(cmsg);
*(int *)CMSG_DATA(&cmsg.cm) = data->fd;
}
// init msg_iov
iovec iov[1];
iov[0].iov_base = data;
iov[0].iov_len = size;
msg.msg_iov = iov;
msg.msg_iovlen = 1;
// init msg_name
msg.msg_name = NULL;
msg.msg_namelen = 0;
if (sendmsg(sock, &msg, 0) == -1){
cout << "[write_fd_struct] sendmsg error" << endl;
return (-1);
}
return 0;
}
int read_fd_struct(int sock, fd_s* data, int size)
{
msghdr msg;
// msg_iov
iovec iov[1];
iov[0].iov_base = data;
iov[0].iov_len = size;
msg.msg_iov = iov;
msg.msg_iovlen = 1;
// msg_name
msg.msg_name = NULL;
msg.msg_namelen = 0;
// msg_control
union { // union to create a 8B aligned memory.
struct cmsghdr cm; // 16B = 8+4+4
char space[CMSG_SPACE(sizeof(int))]; // 24B = 16+4+4
} cmsg;
memset(&cmsg, 0, sizeof(cmsg));
msg.msg_control = (cmsghdr*)&cmsg;
msg.msg_controllen = sizeof(cmsg);
if (recvmsg(sock, &msg, 0) == -1) {
cout << "[read_fd_struct] recvmsg error" << endl;
return (-1);
}
#if 1
printf( "recvmsg() ends, data is: fd:%d\n", data->fd);
#endif
data->fd = *(int *)CMSG_DATA(&cmsg.cm);
//int fd = *(int *)CMSG_DATA(&cmsg.cm);
#if 1
printf( "recvmsg() ends, after pass, fd turns to:%d, different?\n", data->fd);
#endif
return data->fd;
//return fd;
}
RESTORE:reversing the read/write direction of BACKUP.
Backup process side:
backup_process\client.cpp:
if(!strcasecmp(buf, "RESTORE"))
{
......
int ret = do_restore(fd_backup2);
if(ret < 0)
fprintf(stderr, "do_restore() II return FAIL.\n");
ret = do_restore(fd_backup);
if(ret < 0)
fprintf(stderr, "do_restore() I return FAIL.\n");
}
backup_process\client.cpp:
int do_restore(SOCKET fd)
{
......
ret = write_fd_struct(fdaccept, (fd == fd_info[0].fd)? &fd_info[0] : &fd_info[1] , \
sizeof(fd_s) );
......
}
(data )Server side:
server_test\client.cpp:
if(!strcasecmp(buf, "RESTORE"))
{
......
for(int i = 0; i < 2; i++)
{
sleep(3);
int fd_restore = to_restore();
if(fd_restore < 0)
fprintf(stderr, "to_restore() return FAIL.\n");
}
data_sock = fd_info[0].owner == 1 ? fd_info[0].fd : fd_info[1].fd ;
data_sock2 = fd_info[0].owner == 2 ? fd_info[0].fd : fd_info[1].fd ;
*p_data_sock[0] = data_sock ;
*p_data_sock[1] = data_sock2;
}
server_test\client.cpp:
int to_restore()
{
......
static int count = 0;
ret = read_fd_struct(sockfd, &fd_info[count], sizeof(fd_s));
++count;
......
return ret;
}
Passing file descriptors is achieved by
sendmsg/recvmsg. Some books illustrate it. And there are resources about it on the web, here
some of them (in Chinese) are:
(How to pass file descriptors between processes)
(Passing file descriptors in advanced inter-process communication)
(Passing file descriptors between processes: unit one)
(Passing file descriptors between processes: unit two)
(Passing file descriptors between processes: unit three)
(Passing file descriptors between processes) Or
Here
(Passing file descriptors between processes using Unix domain socket)
The book
Advanced Programming in the Unix Environment illustrated it. The third volume of
TCP/IP Illustrated analyzed its implementation.
The book
Unix Network Programming volume 1 also talked about it. Other books also exist. Such as
Linux Network Programming( in Chinese )by
Jing-Bin Song,
Linux Socket Programming By Example by Warren Gay( which also got a Chinese version) and
Linux Advanced Programming(in Chinese)
by Zong-De Yang.
It is said there are similar mechanisms on Windows:
DuplicateHandle,
WSADuplicateSocket.
Besides, to let the server program be able to restart immediately, I set
SO_REUSEADDR option on the socket.
server_test\server.cpp
int opt_val = 1;
setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &opt_val, sizeof(opt_val));
The code has been tested under both Redhat 9 and Ubuntu 14. (Under Cygwin it can be compiled, but there isn't passing file description function
implemented on Linux.)
Of couse, the best of all is the server program written is without any bug and will never crash. But if the idea of this article
is realized, it will be a powerful tool.