Eric W. Biederman says:
====================
Using dev_kfree_skb_any for functions called in multiple contexts
This patchset should be an uncontroversial set of changes to change
dev_kfree_skb to dev_kfree_skb_any for code paths that are called in
hard irq contexts in addition to other contexts. netpoll is the reason
this code gets called in multiple contexts.
There is more coming but these changes are a good starting place, and
stand on their own.
Since the last round changes to the rx path have been removed netpoll
will changed to avoid that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
netpoll can call functions in hard irq context that are ordinarily
called in lesser contexts. For those functions use dev_kfree_skb_any
and dev_consume_skb_any so skbs are freed safely from hard irq
context.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace free_skb with dev_kfree_skb_any in be_tx_compl_process as
which can be called in hard irq by netpoll, softirq context
by normal napi polling, and in normal sleepable context
by the network device close method.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace kfree_skb with dev_kfree_skb_any in cp_start_xmit
as it can be called in both hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VLAN is added by user, adapter->vlans_added is incremented.
But if the VLAN is already programmed in HW, driver ends up
incrementing the counter wrongly.
Increment the counter only if VLAN is not already programmed in the HW.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the driver creates only a single TXQ on any BE3-R multi-channel
interface.
This patch changes this and creates multiple TXQs on RSS-capable multi-channel
BE3-R interfaces. This change helps improve the TX pps performance on the
affected interface.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The allocation size must be be_max_uc() and not "be_max_uc() + 1"
Signed-off-by: Ravikumar Nelavelli <ravikumar.nelavelli@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ravikumar Nelavelli <ravikumar.nelavelli@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to control VF's link state by implementing the
ndo_set_vf_link_state() hook.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use GET_PROFILE_CONFIG_V1 cmd even for BE3-R (it's already used for
Lancer-R and Skyhawk-R), to query max-vfs value supported by the FW.
This is needed as on some configs, the value exported in the PCI-config
space is not accurate.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz says:
====================
mlx4: Add SRIOV support for RoCE
This series adds SRIOV support for RoCE (RDMA over Ethernet) to the mlx4 driver.
The patches are against net-next, as of commit 2d8d40a "pkt_sched: fq:
do not hold qdisc lock while allocating memory"
changes from V1:
- addressed feedback from Dave on patch #3 and changed get_real_sgid_index()
to be called fill_in_real_sgid_index() and be a void function.
- removed some checkpatch warnings on long lines
changes from V0:
- always check the return code of mlx4_get_roce_gid_from_slave().
The call we fixed is introduced in patch #1 and later removed by
patch #3 that allows guests to have multiple GIDS. The 1..3
separation was done for proper division of patches to logical changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To activate RoCE/SRIOV, need to remove the following:
1. In mlx4_ib_add, need to remove the error return preventing
initialization of a RoCE port under SRIOV.
2. In update_vport_qp_params (in resource_tracker.c) need to remove
the error return when a RoCE RC or UD qp is detected.
This error return causes the INIT-to-RTR qp transition to fail
in the wrapper function under RoCE/SRIOV.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Handle CM_SIDR_REQ_ATTR_ID and CM_SIDR_REP_ATTR_ID
in multiplex_cm_handler and demux_cm_handler.
* Handle Service ID Resolution messages and REQ messages
separately, for their formats are different.
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there is no connection between the MAC/VLAN and the GID
when using IP-based addressing, the proxy QP1 (running on the
slave) must pass the source-mac, destination-mac, and vlan_id
information separately from the GID. Additionally, the Host
must pass the remote source-mac and vlan_id back to the slave,
This is achieved as follows:
Outgoing MADs:
1. Source MAC: obtained from the CQ completion structure
(struct ib_wc, smac field).
2. Destination MAC: obtained from the tunnel header
3. vlan_id: obtained from the tunnel header.
Incoming MADs
1. The source (i.e., remote) MAC and vlan_id are passed in
the tunnel header to the proxy QP1.
VST mode support:
For outgoing MADs, the vlan_id obtained from the header is
discarded, and the vlan_id specified by the Hypervisor is used
instead.
For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the
"invalid" vlan (0xffff) is substituted when forwarding to the slave.
Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IB side of RoCE requires the MAC table index of the
MAC address used by its QPs.
To obtain the real MAC index, the IB side registers the
MAC (increasing its ref count, and also returning the
real MAC index) during the modify-qp sequence.
This protects against the ETH side deleting or modifying
that MAC table entry while the QP is active.
Note that until the modify-qp command returns success,
the MAC and VLAN information only has "candidate" status.
If the modify-qp succeeds, the "candidate" info is promoted
to the operational MAC/VLAN info for the qp. If the modify fails,
the candidate MAC/VLAN is unregistered, and the old qp info
is preserved.
The patch is a bit complex, because there are multiple qp
transitions where the primary-path information may be
modified: INIT-to-RTR, and SQD-to-SQD.
Similarly for the alternate path information.
Therefore the code must handle cases where path information
has already been entered into the QP context by previous
qp transitions.
For the MAC address, the success logic is as follows:
1. If there was no previous MAC, simply move the candidate
MAC information to the operational information, and reset
the candidate MAC info.
2. If there was a previous MAC, unregister it. Then move
the MAC information from candidate to operational, and
reset the candidate info (as in 1. above).
The MAC address failure logic is the same for all cases:
- Unregister the candidate MAC, and reset the candidate MAC info.
For Vlan registration, the logic is similar.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GIDs are statically distributed, as follows:
PF: gets 16 GIDs
VFs: Remaining GIDS are divided evenly between VFs activated by the driver.
If the division is not even, lower-numbered VFs get an extra GID.
For an IB interface, the number of gids per guest remains as before: one gid per guest.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For IB transport, the host determines the slave GIDs. For ETH (RoCE),
however, the slave's GID is determined by the IP address that the slave
itself assigns to the ETH device used by RoCE.
In this case, the slave must be able to write its GIDs to the HCA gid table
(at the GID indices that slave "owns").
This commit adds processing for the SET_PORT_GID_TABLE opcode modifier
for the SET_PORT command wrapper (so that slaves may modify their GIDS
for RoCE).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This requires the following modifications:
1. Fix build_mlx4_header to properly fill in the ETH fields
2. Adjust mux and demux QP1 flow to support RoCE.
This commit still assumes only one GID per slave for RoCE.
The commit enabling multiple GIDs is a subsequent commit, and
is done separately because of its complexity.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy says:
====================
tipc: simplifications in socket and port layer
After the removal of the tipc native API the relation between
tipc_port and its API types is strictly one-to-one, i.e, the latter
can now only be a socket API. This change opens up for
simplifications both in the code, data and locking structure.
We start with this series, where we ensure that port and socket
structures are co-allocated. Note that the first commit in the
series is unrelated to the above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As an artefact from the native interface, the message sending functions
in the port takes a port ref as first parameter, and then looks up in
the registry to find the corresponding port pointer. This despite the
fact that the only currently existing caller, tipc_sock, already knows
this pointer.
We change the signature of these functions to take a struct tipc_port*
argument, and remove the redundant lookups.
We also remove an unmotivated extra lookup in the function
socket.c:auto_connect(), and, as the lookup functions tipc_port_deref()
and ref_deref() now become unused, we remove these two functions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The practice of naming variables in TIPC is inconistent, sometimes
even within the same file.
In this commit we align variable names and declarations within
socket.c, and function and macro names within socket.h. We also
reduce the number of conversion macros to two, in order to make
usage less obsure.
These changes are purely cosmetic.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The three functions tipc_portimportance(), tipc_portunreliable() and
tipc_portunreturnable() and their corresponding tipc_set* functions,
are all grabbing port_lock when accessing the targeted port. This is
unnecessary in the current code, since these calls only are made from
within socket downcalls, already protected by sock_lock.
We remove the redundant locking. Also, since the functions now become
trivial one-liners, we move them to port.h and make them inline.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the original one-to-many relation between port and user API
layers, upcalls to the API have been performed via function pointers,
installed in struct tipc_port at creation. Since this relation now
always is one-to-one, we can instead use ordinary function calls.
We remove the function pointers 'dispatcher' and ´wakeup' from
struct tipc_port, and replace them with calls to the renamed
functions tipc_sk_rcv() and tipc_sk_wakeup().
At the same time we change the name and signature of the functions
tipc_createport() and tipc_deleteport() to reflect their new role
as mere initialization/destruction functions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the removal of the tipc native API the relation between
a tipc_port and its API types is strictly one-to-one, i.e, the
latter can now only be a socket API. There is therefore no need
to allocate struct tipc_port and struct sock independently.
In this commit, we aggregate struct tipc_port into struct tipc_sock,
hence saving both CPU cycles and structure complexity.
There are no functional changes in this commit, except for the
elimination of the separate allocation/freeing of tipc_port.
All other changes are just adaptatons to the new data structure.
This commit also opens up for further code simplifications and
code volume reduction, something we will do in later commits.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field 'peer_name' in struct tipc_sock is redundant, since
this information already is available from tipc_port, to which
tipc_sock has a reference.
We remove the field, and ensure that peer node and peer port
info instead is fetched via the functions that already exist
for this purpose.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lock for protecting the reference table is declared as an
RWLOCK, although it is only used in write mode, never in read
mode.
We redefine it to become a spinlock.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We leak an active timer, the hotcpu notifier and all allocated
resources when we exit a namespace. Fix this by introducing a
flow_cache_fini() function where we release the resources before
we exit.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
priv is not instantiated at gfar_of_init() time, when
parsing the DT for info on supported HW queues. Before
the netdev can be allocated, the number of supported
queues must be known. Because the number of supported
queues depends on device type, move the compatibility
checks before netdev allocation. Local vars are used
to hold the operation mode info before netdev allocation.
This fixes the null accesses for priv->.., in gfar_of_init.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rf233 and rf231 are sufficiently similar that we can treat
rf233 like rf231.
rf233 is missing some features that rf231 has, but we don't currently
make use of them so there's nothing to handle differently yet.
Should we add support in the future for rf231 *_NOCLK or SLEEP states,
or PAD_IO drive strength, exceptions will need to be made for rf233.
Signed-off-by: Thomas Stilwell <stilwellt@openlabs.co>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
pkt_sched: allow scheduling points
We have seen delays of more than 50ms in class or qdisc dumps, in case
device is under high TX stress, even with the prior 4KB per skb limit.
With the new 16KB limit, this could translate to 200ms delays.
Add cond_resched() to give a chance to higher prio tasks to get cpu.
But before doing so, we need to remove the rcu locking from tc_dump_qdisc()
as David spotted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We have seen delays of more than 50ms in class or qdisc dumps, in case
device is under high TX stress, even with the prior 4KB per skb limit.
Add cond_resched() to give a chance to higher prio tasks to get cpu.
Signed-off-by; Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like all rtnetlink dump operations, we hold RTNL in tc_dump_qdisc(),
so we do not need to use rcu protection to protect list of netdevices.
This will allow preemption to occur, thus reducing latencies.
Following patch adds explicit cond_resched() calls.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The option code is taking IP address and putting it into a generic
container. Force cast to silence sparse warnings.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes sparse warnings in vlan driver.
It propagates the sparse __percpu attribute from alloc_percpu
into netdev_alloc_pcpu_stats. I expect it may trigger additional
sparse warnings from other drivers that are missing the __percpu
attribute.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>