This patch defines a new syscall fanotify_init() of the form:
int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
unsigned int priority)
This syscall is used to create and fanotify group. This is very similar to
the inotify_init() syscall.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently if 2 events are going to be merged on the notication queue with
different masks the second event will be cloned and will replace the first
event. However if this notification queue is the only place referencing
the event in question there is no reason not to just update the event in
place. We can tell this if the event->refcnt == 1. Since we hold a
reference for each queue this event is on we know that when refcnt == 1
this is the only queue. The other concern is that it might be about to be
added to a new queue, but this can't be the case since fsnotify holds a
reference on the event until it is finished adding it to queues.
Signed-off-by: Eric Paris <eparis@redhat.com>
Instead of just merging fanotify events if they are exactly the same, merge
notification events with different masks. To do this we have to clone the
old event, update the mask in the new event with the new merged mask, and
put the new event in place of the old event.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify listeners get an open file descriptor to the object in question so
the ordering of operations is not as important as in other notification
systems. inotify will drop events if the last event in the event FIFO is
the same as the current event. This patch will drop fanotify events if
they are the same as another event anywhere in the event FIFO.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is a novel file notification system which bases notification on
giving userspace both an event type (open, close, read, write) and an open
file descriptor to the object in question. This should address a number of
races and problems with other notification systems like inotify and dnotify
and should allow the future implementation of blocking or access controlled
notification. These are useful for on access scanners or hierachical storage
management schemes.
This patch just implements the basics of the fsnotify functions.
Signed-off-by: Eric Paris <eparis@redhat.com>
sparc used the same value as FMODE_NONOTIFY so change FMODE_NONOTIFY to be
something unique.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
This is a new f_mode which can only be set by the kernel. It indicates
that the fd was opened by fanotify and should not cause future fanotify
events. This is needed to prevent fanotify livelock. An example of
obvious livelock is from fanotify close events.
Process A closes file1
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
notice a pattern?
The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
for fanotify. Thus when that file is used it will not generate future
events.
This patch simply defines the bit.
Signed-off-by: Eric Paris <eparis@redhat.com>
All callers to fsnotify_find_mark_entry() except one take and
release inode->i_lock around the call. Take the lock inside
fsnotify_find_mark_entry() instead.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
nomenclature change. Used to call things 'entries' but now we just call
them 'marks.' Do those changes for dnotify.
Signed-off-by: Eric Paris <eparis@redhat.com>
previously I used mark_entry when talking about marks on inodes. The
_entry is pretty useless. Just use "mark" instead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Some fsnotify operations send a struct file. This is more information than
we technically need. We instead send a struct path in all cases instead of
sometimes a path and sometimes a file.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
To differentiate between inode and vfsmount (or other future) types of
marks we add a flags field and set the inode bit on inode marks (the only
currently supported type of mark)
Signed-off-by: Eric Paris <eparis@redhat.com>
vfsmount marks need mostly the same data as inode specific fields, but for
consistency and understandability we put that data in a vfsmount specific
struct inside a union with inode specific data.
Signed-off-by: Eric Paris <eparis@redhat.com>
The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy. This patch just implements the
inode struct and the union.
Signed-off-by: Eric Paris <eparis@redhat.com>
To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases. We pass a vfsmount to this function so groups can make
their own determinations.
Signed-off-by: Eric Paris <eparis@redhat.com>
currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.) This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point. This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.
The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode. With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation. By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock. vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently all fsnotify groups are added immediately to the
fsnotify_inode_groups list upon creation. This means, even groups with no
watches (common for audit) will be on the global tracking list and will
get checked for every event. This patch adds groups to the global list on
when the first inode mark is added to the group.
Signed-of-by: Eric Paris <eparis@redhat.com>
Currently the comments say that group->num_marks is held because the group
is on the fsnotify_group list. This isn't strictly the case, we really
just hold the num_marks for the life of the group (any time group->refcnt
is != 0) This patch moves the initialization stuff and makes it clear when
it is really being held.
Signed-off-by: Eric Paris <eparis@redhat.com>
Simple renaming patch. fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently the audit watch group always sets a mask equal to all events it
might care about. We instead should only set the group mask if we are
actually watching inodes. This should be a perf win when audit watches are
compiled in.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify_obtain_group was intended to be able to find an already existing
group. Nothing uses that functionality. This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0.
This patch just deletes those useless lines.
Signed-off-by: Eric Paris <eparis@redhat.com>
The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added. I no longer think this is a
necessary thing to do and so we remove the group_num.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify_replace_event need to lock both the old and the new event. This
causes lockdep to get all pissed off since it dosn't know this is safe.
It's safe in this case since the new event is impossible to be reached from
other places in the kernel.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify would like to clone events already on its notification list, make
changes to the new event, and then replace the old event on the list with
the new event. This patch implements the replace functionality of that
process.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify_clone_event will take an event, clone it, and return the cloned
event to the caller. Since events may be in use by multiple fsnotify
groups simultaneously certain event entries (such as the mask) cannot be
changed after the event was created. Since fanotify would like to merge
events happening on the same file it needs a new clean event to work with
so it can change any fields it wishes.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify only wishes to merge a new event with the last event on the
notification fifo. fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together. This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code. This allows each use of fsnotify to provide
their own merge functionality.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify needs a path in order to open an fd to the object which changed.
Currently notifications to inode's parents are done using only the inode.
For some parental notification we have the entire file, send that so
fanotify can use it.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is going to need to look at file->private_data to know if an event
should be sent or not. This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available
Signed-off-by: Eric Paris <eparis@redhat.com>
fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener. Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify was supposed to have a dmesg printk ratelimitor which would cause
inotify to only emit one message per boot. The static bool was never set
so it kept firing messages. This patch correctly limits warnings in multiple
places.
Signed-off-by: Eric Paris <eparis@redhat.com>
Prior to 2.6.31 inotify would not reuse watch descriptors until all of
them had been used at least once. After the rewrite inotify would reuse
watch descriptors. The selinux utility 'restorecond' was found to have
problems when watch descriptors were reused. This patch reverts to the
pre inotify rewrite behavior to not reuse watch descriptors.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify event initialization is done entry by entry with almost everything
set to either 0 or NULL. Use kmem_cache_zalloc and only initialize things
that need non-zero initialization. Also means we don't have to change
initialization entries based on the config options.
Signed-off-by: Eric Paris <eparis@redhat.com>
inotify_free_mark casts directly from an fsnotify_mark_entry to an
inotify_inode_mark_entry. This works, but should use container_of instead
for future proofing.
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently fsnotify defines a static fsnotify event which is sent when a
group overflows its allotted queue length. This patch just allocates that
event from the event cache rather than defining it statically. There is no
known reason that the current implementation is wrong, but this makes sure the
event is initialized and created like any other.
Signed-off-by: Eric Paris <eparis@redhat.com>
Audit watch init and fsnotify init both use subsys_initcall() but since the
audit watch code is linked in before the fsnotify code the audit watch code
would be using the fsnotify srcu struct before it was initialized. This
patch fixes that problem by moving audit watch init to device_initcall() so
it happens after fsnotify is ready.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Paris <eparis@redhat.com>
Tested-by : Sachin Sant <sachinp@in.ibm.com>
Audit watch should depend on CONFIG_AUDIT_SYSCALL and should select
FSNOTIFY. This splits the spagetti like mixing of audit_watch and
audit_filter code so they can be configured seperately.
Signed-off-by: Eric Paris <eparis@redhat.com>
CONFIG_AUDIT builds audit_watches which depend on fsnotify. Make
CONFIG_AUDIT select fsnotify.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch allows a task to add a second fsnotify mark to an inode for the
same group. This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function. This
is useful if a user wants to add a new mark before removing the old one.
Signed-off-by: Eric Paris <eparis@redhat.com>
Simple copy fsnotify information from one mark to another in preparation
for the second mark to replace the first.
Signed-off-by: Eric Paris <eparis@redhat.com>
deleting audit watch rules is not currently done under audit_filter_mutex.
It was done this way because we could not hold the mutex during inotify
manipulation. Since we are using fsnotify we don't need to do the extra
get/put pair nor do we need the private list on which to store the parents
while they are about to be freed.
Signed-off-by: Eric Paris <eparis@redhat.com>
fsnotify can handle mutexes to be held across all fsnotify operations since
it deals strickly in spinlocks. This can simplify and reduce some of the
audit_filter_mutex taking and dropping.
Signed-off-by: Eric Paris <eparis@redhat.com>