ocfs2: fix cluster hang after a node dies
commit1c01967116
upstream. When a node dies, other live nodes have to choose a new master for an existed lock resource mastered by the dead node. As for ocfs2/dlm implementation, this is done by function - dlm_move_lockres_to_recovery_list which marks those lock rsources as DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes lock resource's master later. So without invoking dlm_move_lockres_to_recovery_list, no master will be choosed after dlm recovery accomplishment since no lock resource can be found through ::resource list. What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock resources mastered a dead node, it will break up synchronization among nodes. So invoke dlm_move_lockres_to_recovery_list again. Fixs: 'commitee8f7fcbe6
("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reported-by: Vitaly Mayatskih <v.mayatskih@gmail.com> Tested-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit is contained in:
Родитель
bbce81fc96
Коммит
e250a19937
|
@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct dlm_ctxt *dlm, u8 dead_node)
|
|||
dlm_lockres_put(res);
|
||||
continue;
|
||||
}
|
||||
dlm_move_lockres_to_recovery_list(dlm, res);
|
||||
} else if (res->owner == dlm->node_num) {
|
||||
dlm_free_dead_locks(dlm, res, dead_node);
|
||||
__dlm_lockres_calc_usage(dlm, res);
|
||||
|
|
Загрузка…
Ссылка в новой задаче