New IDT NTB driver and changes to the NTB infrastructure to allow for

this different kind of NTB HW, some style fixes (per Greg KH recommendation), and some ntb_test tweaks. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZZowhAAoJEG5mS6x6i9IjNx8P/0DTYNqBRXg4PSET7+6/Z6yq Gz09DRw42dwHXN70ytEwtdRP7jcp3agrvBpRty3EgBygpLlMfOA0iQ6crVdBaza0 XzRuSl1iEd8oSX3CmJN8dIIWueNiPiqvG6as+j1kjZRZst21oqiMLTO3uO37oBh+ 7/ltCkQtQdco74fiaqnN1iDeKK2lj3qHPJxb2AbeGIxj3lOr6NttPifd0zovimtb keBCbaYyQaTxDoNzRwQmZtQNB2bwC/2V1sAM5L1T/xkVIBWXH8AUqJO1iVC4ptjI VURvxitDMo28JDc4hgO63AVHsvz8zLbgKsCNK0ia+rtalSLJkDBBN8Qaf7e10hbh 6YUg2eXzIGusdvFowKnOdXhHYAqrQXMBcWgM2ZVe1YBj8QqZKQ0LL4h9ooVuP9HV 2hZc3GfMT+iUGmQKdk+rtDVLxYNnsng1NT/5ZxCHBQptrb+/Rqey/wDYdPV2TpnD JfRpIFny29NOu74sa4W5GEPWI3AL7WAPAA1WH7RHy5rVDPfX20DcUQdvKvpo25y8 IdH7caSiBrjpc2GQPvkaziHlLLdZoRNwqJnmHDyUatukdmrElV/oTIRhaYQmLMK0 /xTFS7YXsmiOGqlfILW9VXdZIlLXSKuNSPX1HUcJ91l2Ik7m0PiK382jKOJVXqcZ Frt1CgnCclCDYHBD4yfm =SG8F -----END PGP SIGNATURE----- Merge tag 'ntb-4.13' of git://github.com/jonmason/ntb Pull NTB updates from Jon Mason: "The major change in the series is a rework of the NTB infrastructure to all for IDT hardware to be supported (and resulting fallout from that). There are also a few clean-ups, etc. New IDT NTB driver and changes to the NTB infrastructure to allow for this different kind of NTB HW, some style fixes (per Greg KH recommendation), and some ntb_test tweaks" * tag 'ntb-4.13' of git://github.com/jonmason/ntb: ntb_netdev: set the net_device's parent ntb: Add error path/handling to Debug FS entry creation ntb: Add more debugfs support for ntb_perf testing options ntb: Remove debug-fs variables from the context structure ntb: Add a module option to control affinity of DMA channels NTB: Add IDT 89HPESxNTx PCIe-switches support ntb_hw_intel: Style fixes: open code macros that just obfuscate code ntb_hw_amd: Style fixes: open code macros that just obfuscate code NTB: Add ntb.h comments NTB: Add PCIe Gen4 link speed NTB: Add new Memory Windows API documentation NTB: Add Messaging NTB API NTB: Alter Scratchpads API to support multi-ports devices NTB: Alter MW API to support multi-ports devices NTB: Alter link-state API to support multi-port devices NTB: Add indexed ports NTB API NTB: Make link-state API being declared first NTB: ntb_test: add parameter for doorbell bitmask NTB: ntb_test: modprobe on remote host
2017-07-14 13:31:52 -07:00 · 2017-07-14 13:31:52 -07:00 · ccd5d1b91f
--- a/Documentation/ntb.txt
+++ b/Documentation/ntb.txt
@ -1,14 +1,16 @@
 # NTB Drivers

 NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
-the separate memory systems of two computers to the same PCI-Express fabric.
-Existing NTB hardware supports a common feature set, including scratchpad
-registers, doorbell registers, and memory translation windows.  Scratchpad
-registers are read-and-writable registers that are accessible from either side
-of the device, so that peers can exchange a small amount of information at a
-fixed address.  Doorbell registers provide a way for peers to send interrupt
-events.  Memory windows allow translated read and write access to the peer
-memory.
+the separate memory systems of two or more computers to the same PCI-Express
+fabric. Existing NTB hardware supports a common feature set: doorbell
+registers and memory translation windows, as well as non common features like
+scratchpad and message registers. Scratchpad registers are read-and-writable
+registers that are accessible from either side of the device, so that peers can
+exchange a small amount of information at a fixed address. Message registers can
+be utilized for the same purpose. Additionally they are provided with with
+special status bits to make sure the information isn't rewritten by another
+peer. Doorbell registers provide a way for peers to send interrupt events.
+Memory windows allow translated read and write access to the peer memory.

 ## NTB Core Driver (ntb)

@ -26,6 +28,87 @@ as ntb hardware, or hardware drivers, are inserted and removed.  The
 registration uses the Linux Device framework, so it should feel familiar to
 anyone who has written a pci driver.

+### NTB Typical client driver implementation
+
+Primary purpose of NTB is to share some peace of memory between at least two
+systems. So the NTB device features like Scratchpad/Message registers are
+mainly used to perform the proper memory window initialization. Typically
+there are two types of memory window interfaces supported by the NTB API:
+inbound translation configured on the local ntb port and outbound translation
+configured by the peer, on the peer ntb port. The first type is
+depicted on the next figure
+
+Inbound translation:
+ Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
+  ____________
+ | dma-mapped |-ntb_mw_set_trans(addr)  |
+ | memory     |        _v____________   |   ______________
+ | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
+ |------------|       |--------------|  |  |--------------|
+
+So typical scenario of the first type memory window initialization looks:
+1) allocate a memory region, 2) put translated address to NTB config,
+3) somehow notify a peer device of performed initialization, 4) peer device
+maps corresponding outbound memory window so to have access to the shared
+memory region.
+
+The second type of interface, that implies the shared windows being
+initialized by a peer device, is depicted on the figure:
+
+Outbound translation:
+ Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
+  ____________                      ______________
+ | dma-mapped |                |   | MW base addr |<== memory-mapped IO
+ | memory     |                |   |--------------|
+ | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
+ |------------|                |   |--------------|
+
+Typical scenario of the second type interface initialization would be:
+1) allocate a memory region, 2) somehow deliver a translated address to a peer
+device, 3) peer puts the translated address to NTB config, 4) peer device maps
+outbound memory window so to have access to the shared memory region.
+
+As one can see the described scenarios can be combined in one portable
+algorithm.
+ Local device:
+  1) Allocate memory for a shared window
+  2) Initialize memory window by translated address of the allocated region
+     (it may fail if local memory window initialization is unsupported)
+  3) Send the translated address and memory window index to a peer device
+ Peer device:
+  1) Initialize memory window with retrieved address of the allocated
+     by another device memory region (it may fail if peer memory window
+     initialization is unsupported)
+  2) Map outbound memory window
+
+In accordance with this scenario, the NTB Memory Window API can be used as
+follows:
+ Local device:
+  1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
+     be allocated for memory windows between local device and peer device
+     of port with specified index.
+  2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
+     shared memory region alignment and size. Then memory can be properly
+     allocated.
+  3) Allocate physically contiguous memory region in compliance with
+     restrictions retrieved in 2).
+  4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
+     the memory window with specified index for the defined peer device
+     (it may fail if local translated address setting is not supported)
+  5) Send translated base address (usually together with memory window
+     number) to the peer device using, for instance, scratchpad or message
+     registers.
+ Peer device:
+  1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
+     device (related to pidx) translated address for specified memory
+     window. It may fail if retrieved address, for instance, exceeds
+     maximum possible address or isn't properly aligned.
+  2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
+     window so to have an access to the shared memory.
+
+Also it is worth to note, that method ntb_mw_count(pidx) should return the
+same value as ntb_peer_mw_count() on the peer with port index - pidx.
+
 ### NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)

 The primary client for NTB is the Transport client, used in tandem with NTB
--- a/6
+++ b/6
@ -9381,6 +9381,12 @@ F:	include/linux/ntb.h
 F:	include/linux/ntb_transport.h
 F:	tools/testing/selftests/ntb/

+NTB IDT DRIVER
+M:	Serge Semin <fancer.lancer@gmail.com>
+L:	linux-ntb@googlegroups.com
+S:	Supported
+F:	drivers/ntb/hw/idt/
+
 NTB INTEL DRIVER
 M:	Jon Mason <jdmason@kudzu.us>
 M:	Dave Jiang <dave.jiang@intel.com>
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@ -418,6 +418,8 @@ static int ntb_netdev_probe(struct device *client_dev)
 	if (!ndev)
 		return -ENOMEM;

+	SET_NETDEV_DEV(ndev, client_dev);
+
 	dev = netdev_priv(ndev);
 	dev->ndev = ndev;
 	dev->pdev = pdev;
--- a/drivers/ntb/hw/Kconfig
+++ b/drivers/ntb/hw/Kconfig
@ -1,2 +1,3 @@
 source "drivers/ntb/hw/amd/Kconfig"
+source "drivers/ntb/hw/idt/Kconfig"
 source "drivers/ntb/hw/intel/Kconfig"
--- a/drivers/ntb/hw/Makefile
+++ b/drivers/ntb/hw/Makefile
@ -1,2 +1,3 @@
 obj-$(CONFIG_NTB_AMD)	+= amd/
+obj-$(CONFIG_NTB_IDT)	+= idt/
 obj-$(CONFIG_NTB_INTEL)	+= intel/
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@ -5,6 +5,7 @@
 *   GPL LICENSE SUMMARY
 *
 *   Copyright (C) 2016 Advanced Micro Devices, Inc. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of version 2 of the GNU General Public License as
@ -13,6 +14,7 @@
 *   BSD LICENSE
 *
 *   Copyright (C) 2016 Advanced Micro Devices, Inc. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
@ -79,40 +81,42 @@ static int ndev_mw_to_bar(struct amd_ntb_dev *ndev, int idx)
 	return 1 << idx;
 }

-static int amd_ntb_mw_count(struct ntb_dev *ntb)
+static int amd_ntb_mw_count(struct ntb_dev *ntb, int pidx)
 {
+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
 	return ntb_ndev(ntb)->mw_count;
 }

-static int amd_ntb_mw_get_range(struct ntb_dev *ntb, int idx,
-				phys_addr_t *base,
-				resource_size_t *size,
-				resource_size_t *align,
-				resource_size_t *align_size)
+static int amd_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
+				resource_size_t *addr_align,
+				resource_size_t *size_align,
+				resource_size_t *size_max)
 {
 	struct amd_ntb_dev *ndev = ntb_ndev(ntb);
 	int bar;

+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
 	bar = ndev_mw_to_bar(ndev, idx);
 	if (bar < 0)
 		return bar;

-	if (base)
-		*base = pci_resource_start(ndev->ntb.pdev, bar);
+	if (addr_align)
+		*addr_align = SZ_4K;

-	if (size)
-		*size = pci_resource_len(ndev->ntb.pdev, bar);
+	if (size_align)
+		*size_align = 1;

-	if (align)
-		*align = SZ_4K;
-
-	if (align_size)
-		*align_size = 1;
+	if (size_max)
+		*size_max = pci_resource_len(ndev->ntb.pdev, bar);

 	return 0;
 }

-static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
+static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
 				dma_addr_t addr, resource_size_t size)
 {
 	struct amd_ntb_dev *ndev = ntb_ndev(ntb);
@ -122,11 +126,14 @@ static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
 	u64 base_addr, limit, reg_val;
 	int bar;

+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
 	bar = ndev_mw_to_bar(ndev, idx);
 	if (bar < 0)
 		return bar;

-	mw_size = pci_resource_len(ndev->ntb.pdev, bar);
+	mw_size = pci_resource_len(ntb->pdev, bar);

 	/* make sure the range fits in the usable mw size */
 	if (size > mw_size)
@ -135,7 +142,7 @@ static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
 	mmio = ndev->self_mmio;
 	peer_mmio = ndev->peer_mmio;

-	base_addr = pci_resource_start(ndev->ntb.pdev, bar);
+	base_addr = pci_resource_start(ntb->pdev, bar);

 	if (bar != 1) {
 		xlat_reg = AMD_BAR23XLAT_OFFSET + ((bar - 2) << 2);
@ -212,7 +219,7 @@ static int amd_link_is_up(struct amd_ntb_dev *ndev)
 	return 0;
 }

-static int amd_ntb_link_is_up(struct ntb_dev *ntb,
+static u64 amd_ntb_link_is_up(struct ntb_dev *ntb,
 			      enum ntb_speed *speed,
 			      enum ntb_width *width)
 {
@ -225,7 +232,7 @@ static int amd_ntb_link_is_up(struct ntb_dev *ntb,
 		if (width)
 			*width = NTB_LNK_STA_WIDTH(ndev->lnk_sta);

-		dev_dbg(ndev_dev(ndev), "link is up.\n");
+		dev_dbg(&ntb->pdev->dev, "link is up.\n");

 		ret = 1;
 	} else {
@ -234,7 +241,7 @@ static int amd_ntb_link_is_up(struct ntb_dev *ntb,
 		if (width)
 			*width = NTB_WIDTH_NONE;

-		dev_dbg(ndev_dev(ndev), "link is down.\n");
+		dev_dbg(&ntb->pdev->dev, "link is down.\n");
 	}

 	return ret;
@ -254,7 +261,7 @@ static int amd_ntb_link_enable(struct ntb_dev *ntb,

 	if (ndev->ntb.topo == NTB_TOPO_SEC)
 		return -EINVAL;
-	dev_dbg(ndev_dev(ndev), "Enabling Link.\n");
+	dev_dbg(&ntb->pdev->dev, "Enabling Link.\n");

 	ntb_ctl = readl(mmio + AMD_CNTL_OFFSET);
 	ntb_ctl |= (PMM_REG_CTL | SMM_REG_CTL);
@ -275,7 +282,7 @@ static int amd_ntb_link_disable(struct ntb_dev *ntb)

 	if (ndev->ntb.topo == NTB_TOPO_SEC)
 		return -EINVAL;
-	dev_dbg(ndev_dev(ndev), "Enabling Link.\n");
+	dev_dbg(&ntb->pdev->dev, "Enabling Link.\n");

 	ntb_ctl = readl(mmio + AMD_CNTL_OFFSET);
 	ntb_ctl &= ~(PMM_REG_CTL | SMM_REG_CTL);
@ -284,6 +291,31 @@ static int amd_ntb_link_disable(struct ntb_dev *ntb)
 	return 0;
 }

+static int amd_ntb_peer_mw_count(struct ntb_dev *ntb)
+{
+	/* The same as for inbound MWs */
+	return ntb_ndev(ntb)->mw_count;
+}
+
+static int amd_ntb_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
+				    phys_addr_t *base, resource_size_t *size)
+{
+	struct amd_ntb_dev *ndev = ntb_ndev(ntb);
+	int bar;
+
+	bar = ndev_mw_to_bar(ndev, idx);
+	if (bar < 0)
+		return bar;
+
+	if (base)
+		*base = pci_resource_start(ndev->ntb.pdev, bar);
+
+	if (size)
+		*size = pci_resource_len(ndev->ntb.pdev, bar);
+
+	return 0;
+}
+
 static u64 amd_ntb_db_valid_mask(struct ntb_dev *ntb)
 {
 	return ntb_ndev(ntb)->db_valid_mask;
@ -400,30 +432,30 @@ static int amd_ntb_spad_write(struct ntb_dev *ntb,
 	return 0;
 }

-static u32 amd_ntb_peer_spad_read(struct ntb_dev *ntb, int idx)
+static u32 amd_ntb_peer_spad_read(struct ntb_dev *ntb, int pidx, int sidx)
 {
 	struct amd_ntb_dev *ndev = ntb_ndev(ntb);
 	void __iomem *mmio = ndev->self_mmio;
 	u32 offset;

-	if (idx < 0 || idx >= ndev->spad_count)
+	if (sidx < 0 || sidx >= ndev->spad_count)
 		return -EINVAL;

-	offset = ndev->peer_spad + (idx << 2);
+	offset = ndev->peer_spad + (sidx << 2);
 	return readl(mmio + AMD_SPAD_OFFSET + offset);
 }

-static int amd_ntb_peer_spad_write(struct ntb_dev *ntb,
-				   int idx, u32 val)
+static int amd_ntb_peer_spad_write(struct ntb_dev *ntb, int pidx,
+				   int sidx, u32 val)
 {
 	struct amd_ntb_dev *ndev = ntb_ndev(ntb);
 	void __iomem *mmio = ndev->self_mmio;
 	u32 offset;

-	if (idx < 0 || idx >= ndev->spad_count)
+	if (sidx < 0 || sidx >= ndev->spad_count)
 		return -EINVAL;

-	offset = ndev->peer_spad + (idx << 2);
+	offset = ndev->peer_spad + (sidx << 2);
 	writel(val, mmio + AMD_SPAD_OFFSET + offset);

 	return 0;
@ -431,8 +463,10 @@ static int amd_ntb_peer_spad_write(struct ntb_dev *ntb,

 static const struct ntb_dev_ops amd_ntb_ops = {
 	.mw_count		= amd_ntb_mw_count,
-	.mw_get_range		= amd_ntb_mw_get_range,
+	.mw_get_align		= amd_ntb_mw_get_align,
 	.mw_set_trans		= amd_ntb_mw_set_trans,
+	.peer_mw_count		= amd_ntb_peer_mw_count,
+	.peer_mw_get_addr	= amd_ntb_peer_mw_get_addr,
 	.link_is_up		= amd_ntb_link_is_up,
 	.link_enable		= amd_ntb_link_enable,
 	.link_disable		= amd_ntb_link_disable,
@ -466,18 +500,19 @@ static void amd_ack_smu(struct amd_ntb_dev *ndev, u32 bit)
 static void amd_handle_event(struct amd_ntb_dev *ndev, int vec)
 {
 	void __iomem *mmio = ndev->self_mmio;
+	struct device *dev = &ndev->ntb.pdev->dev;
 	u32 status;

 	status = readl(mmio + AMD_INTSTAT_OFFSET);
 	if (!(status & AMD_EVENT_INTMASK))
 		return;

-	dev_dbg(ndev_dev(ndev), "status = 0x%x and vec = %d\n", status, vec);
+	dev_dbg(dev, "status = 0x%x and vec = %d\n", status, vec);

 	status &= AMD_EVENT_INTMASK;
 	switch (status) {
 	case AMD_PEER_FLUSH_EVENT:
-		dev_info(ndev_dev(ndev), "Flush is done.\n");
+		dev_info(dev, "Flush is done.\n");
 		break;
 	case AMD_PEER_RESET_EVENT:
 		amd_ack_smu(ndev, AMD_PEER_RESET_EVENT);
@ -503,7 +538,7 @@ static void amd_handle_event(struct amd_ntb_dev *ndev, int vec)
 		status = readl(mmio + AMD_PMESTAT_OFFSET);
 		/* check if this is WAKEUP event */
 		if (status & 0x1)
-			dev_info(ndev_dev(ndev), "Wakeup is done.\n");
+			dev_info(dev, "Wakeup is done.\n");

 		amd_ack_smu(ndev, AMD_PEER_D0_EVENT);

@ -512,14 +547,14 @@ static void amd_handle_event(struct amd_ntb_dev *ndev, int vec)
 				      AMD_LINK_HB_TIMEOUT);
 		break;
 	default:
-		dev_info(ndev_dev(ndev), "event status = 0x%x.\n", status);
+		dev_info(dev, "event status = 0x%x.\n", status);
 		break;
 	}
 }

 static irqreturn_t ndev_interrupt(struct amd_ntb_dev *ndev, int vec)
 {
-	dev_dbg(ndev_dev(ndev), "vec %d\n", vec);
+	dev_dbg(&ndev->ntb.pdev->dev, "vec %d\n", vec);

 	if (vec > (AMD_DB_CNT - 1) || (ndev->msix_vec_count == 1))
 		amd_handle_event(ndev, vec);
@ -541,7 +576,7 @@ static irqreturn_t ndev_irq_isr(int irq, void *dev)
 {
 	struct amd_ntb_dev *ndev = dev;

-	return ndev_interrupt(ndev, irq - ndev_pdev(ndev)->irq);
+	return ndev_interrupt(ndev, irq - ndev->ntb.pdev->irq);
 }

 static int ndev_init_isr(struct amd_ntb_dev *ndev,
@ -550,7 +585,7 @@ static int ndev_init_isr(struct amd_ntb_dev *ndev,
 	struct pci_dev *pdev;
 	int rc, i, msix_count, node;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;

 	node = dev_to_node(&pdev->dev);

@ -592,7 +627,7 @@ static int ndev_init_isr(struct amd_ntb_dev *ndev,
 			goto err_msix_request;
 	}

-	dev_dbg(ndev_dev(ndev), "Using msix interrupts\n");
+	dev_dbg(&pdev->dev, "Using msix interrupts\n");
 	ndev->db_count = msix_min;
 	ndev->msix_vec_count = msix_max;
 	return 0;
@ -619,7 +654,7 @@ err_msix_vec_alloc:
 	if (rc)
 		goto err_msi_request;

-	dev_dbg(ndev_dev(ndev), "Using msi interrupts\n");
+	dev_dbg(&pdev->dev, "Using msi interrupts\n");
 	ndev->db_count = 1;
 	ndev->msix_vec_count = 1;
 	return 0;
@ -636,7 +671,7 @@ err_msi_enable:
 	if (rc)
 		goto err_intx_request;

-	dev_dbg(ndev_dev(ndev), "Using intx interrupts\n");
+	dev_dbg(&pdev->dev, "Using intx interrupts\n");
 	ndev->db_count = 1;
 	ndev->msix_vec_count = 1;
 	return 0;
@ -651,7 +686,7 @@ static void ndev_deinit_isr(struct amd_ntb_dev *ndev)
 	void __iomem *mmio = ndev->self_mmio;
 	int i;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;

 	/* Mask all doorbell interrupts */
 	ndev->db_mask = ndev->db_valid_mask;
@ -777,7 +812,8 @@ static void ndev_init_debugfs(struct amd_ntb_dev *ndev)
 		ndev->debugfs_info = NULL;
 	} else {
 		ndev->debugfs_dir =
-			debugfs_create_dir(ndev_name(ndev), debugfs_dir);
+			debugfs_create_dir(pci_name(ndev->ntb.pdev),
+					   debugfs_dir);
 		if (!ndev->debugfs_dir)
 			ndev->debugfs_info = NULL;
 		else
@ -812,7 +848,7 @@ static int amd_poll_link(struct amd_ntb_dev *ndev)
 	reg = readl(mmio + AMD_SIDEINFO_OFFSET);
 	reg &= NTB_LIN_STA_ACTIVE_BIT;

-	dev_dbg(ndev_dev(ndev), "%s: reg_val = 0x%x.\n", __func__, reg);
+	dev_dbg(&ndev->ntb.pdev->dev, "%s: reg_val = 0x%x.\n", __func__, reg);

 	if (reg == ndev->cntl_sta)
 		return 0;
@ -894,7 +930,8 @@ static int amd_init_ntb(struct amd_ntb_dev *ndev)

 		break;
 	default:
-		dev_err(ndev_dev(ndev), "AMD NTB does not support B2B mode.\n");
+		dev_err(&ndev->ntb.pdev->dev,
+			"AMD NTB does not support B2B mode.\n");
 		return -EINVAL;
 	}

@ -923,10 +960,10 @@ static int amd_init_dev(struct amd_ntb_dev *ndev)
 	struct pci_dev *pdev;
 	int rc = 0;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;

 	ndev->ntb.topo = amd_get_topo(ndev);
-	dev_dbg(ndev_dev(ndev), "AMD NTB topo is %s\n",
+	dev_dbg(&pdev->dev, "AMD NTB topo is %s\n",
 		ntb_topo_string(ndev->ntb.topo));

 	rc = amd_init_ntb(ndev);
@ -935,7 +972,7 @@ static int amd_init_dev(struct amd_ntb_dev *ndev)

 	rc = amd_init_isr(ndev);
 	if (rc) {
-		dev_err(ndev_dev(ndev), "fail to init isr.\n");
+		dev_err(&pdev->dev, "fail to init isr.\n");
 		return rc;
 	}

@ -973,7 +1010,7 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
 		rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (rc)
 			goto err_dma_mask;
-		dev_warn(ndev_dev(ndev), "Cannot DMA highmem\n");
+		dev_warn(&pdev->dev, "Cannot DMA highmem\n");
 	}

 	rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
@ -981,7 +1018,7 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
 		rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (rc)
 			goto err_dma_mask;
-		dev_warn(ndev_dev(ndev), "Cannot DMA consistent highmem\n");
+		dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
 	}

 	ndev->self_mmio = pci_iomap(pdev, 0, 0);
@ -1004,7 +1041,7 @@ err_pci_enable:

 static void amd_ntb_deinit_pci(struct amd_ntb_dev *ndev)
 {
-	struct pci_dev *pdev = ndev_pdev(ndev);
+	struct pci_dev *pdev = ndev->ntb.pdev;

 	pci_iounmap(pdev, ndev->self_mmio);

--- a/drivers/ntb/hw/amd/ntb_hw_amd.h
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.h
@ -211,9 +211,6 @@ struct amd_ntb_dev {
 	struct dentry *debugfs_info;
 };

-#define ndev_pdev(ndev) ((ndev)->ntb.pdev)
-#define ndev_name(ndev) pci_name(ndev_pdev(ndev))
-#define ndev_dev(ndev) (&ndev_pdev(ndev)->dev)
 #define ntb_ndev(__ntb) container_of(__ntb, struct amd_ntb_dev, ntb)
 #define hb_ndev(__work) container_of(__work, struct amd_ntb_dev, hb_timer.work)

--- a/drivers/ntb/hw/idt/Kconfig
+++ b/drivers/ntb/hw/idt/Kconfig
@ -0,0 +1,31 @@
+config NTB_IDT
+	tristate "IDT PCIe-switch Non-Transparent Bridge support"
+	depends on PCI
+	help
+	 This driver supports NTB of cappable IDT PCIe-switches.
+
+	 Some of the pre-initializations must be made before IDT PCIe-switch
+	 exposes it NT-functions correctly. It should be done by either proper
+	 initialisation of EEPROM connected to master smbus of the switch or
+	 by BIOS using slave-SMBus interface changing corresponding registers
+	 value. Evidently it must be done before PCI bus enumeration is
+	 finished in Linux kernel.
+
+	 First of all partitions must be activated and properly assigned to all
+	 the ports with NT-functions intended to be activated (see SWPARTxCTL
+	 and SWPORTxCTL registers). Then all NT-function BARs must be enabled
+	 with chosen valid aperture. For memory windows related BARs the
+	 aperture settings shall determine the maximum size of memory windows
+	 accepted by a BAR. Note that BAR0 must map PCI configuration space
+	 registers.
+
+	 It's worth to note, that since a part of this driver relies on the
+	 BAR settings of peer NT-functions, the BAR setups can't be done over
+	 kernel PCI fixups. That's why the alternative pre-initialization
+	 techniques like BIOS using SMBus interface or EEPROM should be
+	 utilized. Additionally if one needs to have temperature sensor
+	 information printed to system log, the corresponding registers must
+	 be initialized within BIOS/EEPROM as well.
+
+	 If unsure, say N.
+
--- a/drivers/ntb/hw/idt/Makefile
+++ b/drivers/ntb/hw/idt/Makefile
@ -0,0 +1 @@
+obj-$(CONFIG_NTB_IDT) += ntb_hw_idt.o
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
--- a/drivers/ntb/hw/idt/ntb_hw_idt.h
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.h
--- a/drivers/ntb/hw/intel/ntb_hw_intel.c
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.c
@ -6,6 +6,7 @@
 *
 *   Copyright(c) 2012 Intel Corporation. All rights reserved.
 *   Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of version 2 of the GNU General Public License as
@ -15,6 +16,7 @@
 *
 *   Copyright(c) 2012 Intel Corporation. All rights reserved.
 *   Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
@ -270,12 +272,12 @@ static inline int ndev_db_addr(struct intel_ntb_dev *ndev,

 	if (db_addr) {
 		*db_addr = reg_addr + reg;
-		dev_dbg(ndev_dev(ndev), "Peer db addr %llx\n", *db_addr);
+		dev_dbg(&ndev->ntb.pdev->dev, "Peer db addr %llx\n", *db_addr);
 	}

 	if (db_size) {
 		*db_size = ndev->reg->db_size;
-		dev_dbg(ndev_dev(ndev), "Peer db size %llx\n", *db_size);
+		dev_dbg(&ndev->ntb.pdev->dev, "Peer db size %llx\n", *db_size);
 	}

 	return 0;
@ -368,7 +370,8 @@ static inline int ndev_spad_addr(struct intel_ntb_dev *ndev, int idx,

 	if (spad_addr) {
 		*spad_addr = reg_addr + reg + (idx << 2);
-		dev_dbg(ndev_dev(ndev), "Peer spad addr %llx\n", *spad_addr);
+		dev_dbg(&ndev->ntb.pdev->dev, "Peer spad addr %llx\n",
+			*spad_addr);
 	}

 	return 0;
@ -409,7 +412,7 @@ static irqreturn_t ndev_interrupt(struct intel_ntb_dev *ndev, int vec)
 	if ((ndev->hwerr_flags & NTB_HWERR_MSIX_VECTOR32_BAD) && (vec == 31))
 		vec_mask |= ndev->db_link_mask;

-	dev_dbg(ndev_dev(ndev), "vec %d vec_mask %llx\n", vec, vec_mask);
+	dev_dbg(&ndev->ntb.pdev->dev, "vec %d vec_mask %llx\n", vec, vec_mask);

 	ndev->last_ts = jiffies;

@ -428,7 +431,7 @@ static irqreturn_t ndev_vec_isr(int irq, void *dev)
 {
 	struct intel_ntb_vec *nvec = dev;

-	dev_dbg(ndev_dev(nvec->ndev), "irq: %d  nvec->num: %d\n",
+	dev_dbg(&nvec->ndev->ntb.pdev->dev, "irq: %d  nvec->num: %d\n",
 		irq, nvec->num);

 	return ndev_interrupt(nvec->ndev, nvec->num);
@ -438,7 +441,7 @@ static irqreturn_t ndev_irq_isr(int irq, void *dev)
 {
 	struct intel_ntb_dev *ndev = dev;

-	return ndev_interrupt(ndev, irq - ndev_pdev(ndev)->irq);
+	return ndev_interrupt(ndev, irq - ndev->ntb.pdev->irq);
 }

 static int ndev_init_isr(struct intel_ntb_dev *ndev,
@ -448,7 +451,7 @@ static int ndev_init_isr(struct intel_ntb_dev *ndev,
 	struct pci_dev *pdev;
 	int rc, i, msix_count, node;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;

 	node = dev_to_node(&pdev->dev);

@ -487,7 +490,7 @@ static int ndev_init_isr(struct intel_ntb_dev *ndev,
 			goto err_msix_request;
 	}

-	dev_dbg(ndev_dev(ndev), "Using %d msix interrupts\n", msix_count);
+	dev_dbg(&pdev->dev, "Using %d msix interrupts\n", msix_count);
 	ndev->db_vec_count = msix_count;
 	ndev->db_vec_shift = msix_shift;
 	return 0;
@ -515,7 +518,7 @@ err_msix_vec_alloc:
 	if (rc)
 		goto err_msi_request;

-	dev_dbg(ndev_dev(ndev), "Using msi interrupts\n");
+	dev_dbg(&pdev->dev, "Using msi interrupts\n");
 	ndev->db_vec_count = 1;
 	ndev->db_vec_shift = total_shift;
 	return 0;
@ -533,7 +536,7 @@ err_msi_enable:
 	if (rc)
 		goto err_intx_request;

-	dev_dbg(ndev_dev(ndev), "Using intx interrupts\n");
+	dev_dbg(&pdev->dev, "Using intx interrupts\n");
 	ndev->db_vec_count = 1;
 	ndev->db_vec_shift = total_shift;
 	return 0;
@ -547,7 +550,7 @@ static void ndev_deinit_isr(struct intel_ntb_dev *ndev)
 	struct pci_dev *pdev;
 	int i;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;

 	/* Mask all doorbell interrupts */
 	ndev->db_mask = ndev->db_valid_mask;
@ -744,7 +747,7 @@ static ssize_t ndev_ntb_debugfs_read(struct file *filp, char __user *ubuf,
 	union { u64 v64; u32 v32; u16 v16; u8 v8; } u;

 	ndev = filp->private_data;
-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;
 	mmio = ndev->self_mmio;

 	buf_size = min(count, 0x800ul);
@ -1019,7 +1022,8 @@ static void ndev_init_debugfs(struct intel_ntb_dev *ndev)
 		ndev->debugfs_info = NULL;
 	} else {
 		ndev->debugfs_dir =
-			debugfs_create_dir(ndev_name(ndev), debugfs_dir);
+			debugfs_create_dir(pci_name(ndev->ntb.pdev),
+					   debugfs_dir);
 		if (!ndev->debugfs_dir)
 			ndev->debugfs_info = NULL;
 		else
@ -1035,20 +1039,26 @@ static void ndev_deinit_debugfs(struct intel_ntb_dev *ndev)
 	debugfs_remove_recursive(ndev->debugfs_dir);
 }

-static int intel_ntb_mw_count(struct ntb_dev *ntb)
+static int intel_ntb_mw_count(struct ntb_dev *ntb, int pidx)
 {
+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
 	return ntb_ndev(ntb)->mw_count;
 }

-static int intel_ntb_mw_get_range(struct ntb_dev *ntb, int idx,
-				  phys_addr_t *base,
-				  resource_size_t *size,
-				  resource_size_t *align,
-				  resource_size_t *align_size)
+static int intel_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
+				  resource_size_t *addr_align,
+				  resource_size_t *size_align,
+				  resource_size_t *size_max)
 {
 	struct intel_ntb_dev *ndev = ntb_ndev(ntb);
+	resource_size_t bar_size, mw_size;
 	int bar;

+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
 	if (idx >= ndev->b2b_idx && !ndev->b2b_off)
 		idx += 1;

@ -1056,24 +1066,26 @@ static int intel_ntb_mw_get_range(struct ntb_dev *ntb, int idx,
 	if (bar < 0)
 		return bar;

-	if (base)
-		*base = pci_resource_start(ndev->ntb.pdev, bar) +
-			(idx == ndev->b2b_idx ? ndev->b2b_off : 0);
+	bar_size = pci_resource_len(ndev->ntb.pdev, bar);

-	if (size)
-		*size = pci_resource_len(ndev->ntb.pdev, bar) -
-			(idx == ndev->b2b_idx ? ndev->b2b_off : 0);
+	if (idx == ndev->b2b_idx)
+		mw_size = bar_size - ndev->b2b_off;
+	else
+		mw_size = bar_size;

-	if (align)
-		*align = pci_resource_len(ndev->ntb.pdev, bar);
+	if (addr_align)
+		*addr_align = pci_resource_len(ndev->ntb.pdev, bar);

-	if (align_size)
-		*align_size = 1;
+	if (size_align)
+		*size_align = 1;
+
+	if (size_max)
+		*size_max = mw_size;

 	return 0;
 }

-static int intel_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
+static int intel_ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
 				  dma_addr_t addr, resource_size_t size)
 {
 	struct intel_ntb_dev *ndev = ntb_ndev(ntb);
@ -1083,6 +1095,9 @@ static int intel_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
 	u64 base, limit, reg_val;
 	int bar;

+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
 	if (idx >= ndev->b2b_idx && !ndev->b2b_off)
 		idx += 1;

@ -1171,7 +1186,7 @@ static int intel_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
 	return 0;
 }

-static int intel_ntb_link_is_up(struct ntb_dev *ntb,
+static u64 intel_ntb_link_is_up(struct ntb_dev *ntb,
 				enum ntb_speed *speed,
 				enum ntb_width *width)
 {
@ -1206,13 +1221,13 @@ static int intel_ntb_link_enable(struct ntb_dev *ntb,
 	if (ndev->ntb.topo == NTB_TOPO_SEC)
 		return -EINVAL;

-	dev_dbg(ndev_dev(ndev),
+	dev_dbg(&ntb->pdev->dev,
 		"Enabling link with max_speed %d max_width %d\n",
 		max_speed, max_width);
 	if (max_speed != NTB_SPEED_AUTO)
-		dev_dbg(ndev_dev(ndev), "ignoring max_speed %d\n", max_speed);
+		dev_dbg(&ntb->pdev->dev, "ignoring max_speed %d\n", max_speed);
 	if (max_width != NTB_WIDTH_AUTO)
-		dev_dbg(ndev_dev(ndev), "ignoring max_width %d\n", max_width);
+		dev_dbg(&ntb->pdev->dev, "ignoring max_width %d\n", max_width);

 	ntb_ctl = ioread32(ndev->self_mmio + ndev->reg->ntb_ctl);
 	ntb_ctl &= ~(NTB_CTL_DISABLE | NTB_CTL_CFG_LOCK);
@ -1235,7 +1250,7 @@ static int intel_ntb_link_disable(struct ntb_dev *ntb)
 	if (ndev->ntb.topo == NTB_TOPO_SEC)
 		return -EINVAL;

-	dev_dbg(ndev_dev(ndev), "Disabling link\n");
+	dev_dbg(&ntb->pdev->dev, "Disabling link\n");

 	/* Bring NTB link down */
 	ntb_cntl = ioread32(ndev->self_mmio + ndev->reg->ntb_ctl);
@ -1249,6 +1264,36 @@ static int intel_ntb_link_disable(struct ntb_dev *ntb)
 	return 0;
 }

+static int intel_ntb_peer_mw_count(struct ntb_dev *ntb)
+{
+	/* Numbers of inbound and outbound memory windows match */
+	return ntb_ndev(ntb)->mw_count;
+}
+
+static int intel_ntb_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
+				     phys_addr_t *base, resource_size_t *size)
+{
+	struct intel_ntb_dev *ndev = ntb_ndev(ntb);
+	int bar;
+
+	if (idx >= ndev->b2b_idx && !ndev->b2b_off)
+		idx += 1;
+
+	bar = ndev_mw_to_bar(ndev, idx);
+	if (bar < 0)
+		return bar;
+
+	if (base)
+		*base = pci_resource_start(ndev->ntb.pdev, bar) +
+			(idx == ndev->b2b_idx ? ndev->b2b_off : 0);
+
+	if (size)
+		*size = pci_resource_len(ndev->ntb.pdev, bar) -
+			(idx == ndev->b2b_idx ? ndev->b2b_off : 0);
+
+	return 0;
+}
+
 static int intel_ntb_db_is_unsafe(struct ntb_dev *ntb)
 {
 	return ndev_ignore_unsafe(ntb_ndev(ntb), NTB_UNSAFE_DB);
@ -1366,30 +1411,30 @@ static int intel_ntb_spad_write(struct ntb_dev *ntb,
 			       ndev->self_reg->spad);
 }

-static int intel_ntb_peer_spad_addr(struct ntb_dev *ntb, int idx,
+static int intel_ntb_peer_spad_addr(struct ntb_dev *ntb, int pidx, int sidx,
 				    phys_addr_t *spad_addr)
 {
 	struct intel_ntb_dev *ndev = ntb_ndev(ntb);

-	return ndev_spad_addr(ndev, idx, spad_addr, ndev->peer_addr,
+	return ndev_spad_addr(ndev, sidx, spad_addr, ndev->peer_addr,
 			      ndev->peer_reg->spad);
 }

-static u32 intel_ntb_peer_spad_read(struct ntb_dev *ntb, int idx)
+static u32 intel_ntb_peer_spad_read(struct ntb_dev *ntb, int pidx, int sidx)
 {
 	struct intel_ntb_dev *ndev = ntb_ndev(ntb);

-	return ndev_spad_read(ndev, idx,
+	return ndev_spad_read(ndev, sidx,
 			      ndev->peer_mmio +
 			      ndev->peer_reg->spad);
 }

-static int intel_ntb_peer_spad_write(struct ntb_dev *ntb,
-				     int idx, u32 val)
+static int intel_ntb_peer_spad_write(struct ntb_dev *ntb, int pidx,
+				     int sidx, u32 val)
 {
 	struct intel_ntb_dev *ndev = ntb_ndev(ntb);

-	return ndev_spad_write(ndev, idx, val,
+	return ndev_spad_write(ndev, sidx, val,
 			       ndev->peer_mmio +
 			       ndev->peer_reg->spad);
 }
@ -1442,30 +1487,33 @@ static int atom_link_is_err(struct intel_ntb_dev *ndev)

 static inline enum ntb_topo atom_ppd_topo(struct intel_ntb_dev *ndev, u32 ppd)
 {
+	struct device *dev = &ndev->ntb.pdev->dev;
+
 	switch (ppd & ATOM_PPD_TOPO_MASK) {
 	case ATOM_PPD_TOPO_B2B_USD:
-		dev_dbg(ndev_dev(ndev), "PPD %d B2B USD\n", ppd);
+		dev_dbg(dev, "PPD %d B2B USD\n", ppd);
 		return NTB_TOPO_B2B_USD;

 	case ATOM_PPD_TOPO_B2B_DSD:
-		dev_dbg(ndev_dev(ndev), "PPD %d B2B DSD\n", ppd);
+		dev_dbg(dev, "PPD %d B2B DSD\n", ppd);
 		return NTB_TOPO_B2B_DSD;

 	case ATOM_PPD_TOPO_PRI_USD:
 	case ATOM_PPD_TOPO_PRI_DSD: /* accept bogus PRI_DSD */
 	case ATOM_PPD_TOPO_SEC_USD:
 	case ATOM_PPD_TOPO_SEC_DSD: /* accept bogus SEC_DSD */
-		dev_dbg(ndev_dev(ndev), "PPD %d non B2B disabled\n", ppd);
+		dev_dbg(dev, "PPD %d non B2B disabled\n", ppd);
 		return NTB_TOPO_NONE;
 	}

-	dev_dbg(ndev_dev(ndev), "PPD %d invalid\n", ppd);
+	dev_dbg(dev, "PPD %d invalid\n", ppd);
 	return NTB_TOPO_NONE;
 }

 static void atom_link_hb(struct work_struct *work)
 {
 	struct intel_ntb_dev *ndev = hb_ndev(work);
+	struct device *dev = &ndev->ntb.pdev->dev;
 	unsigned long poll_ts;
 	void __iomem *mmio;
 	u32 status32;
@ -1503,30 +1551,30 @@ static void atom_link_hb(struct work_struct *work)

 	/* Clear AER Errors, write to clear */
 	status32 = ioread32(mmio + ATOM_ERRCORSTS_OFFSET);
-	dev_dbg(ndev_dev(ndev), "ERRCORSTS = %x\n", status32);
+	dev_dbg(dev, "ERRCORSTS = %x\n", status32);
 	status32 &= PCI_ERR_COR_REP_ROLL;
 	iowrite32(status32, mmio + ATOM_ERRCORSTS_OFFSET);

 	/* Clear unexpected electrical idle event in LTSSM, write to clear */
 	status32 = ioread32(mmio + ATOM_LTSSMERRSTS0_OFFSET);
-	dev_dbg(ndev_dev(ndev), "LTSSMERRSTS0 = %x\n", status32);
+	dev_dbg(dev, "LTSSMERRSTS0 = %x\n", status32);
 	status32 |= ATOM_LTSSMERRSTS0_UNEXPECTEDEI;
 	iowrite32(status32, mmio + ATOM_LTSSMERRSTS0_OFFSET);

 	/* Clear DeSkew Buffer error, write to clear */
 	status32 = ioread32(mmio + ATOM_DESKEWSTS_OFFSET);
-	dev_dbg(ndev_dev(ndev), "DESKEWSTS = %x\n", status32);
+	dev_dbg(dev, "DESKEWSTS = %x\n", status32);
 	status32 |= ATOM_DESKEWSTS_DBERR;
 	iowrite32(status32, mmio + ATOM_DESKEWSTS_OFFSET);

 	status32 = ioread32(mmio + ATOM_IBSTERRRCRVSTS0_OFFSET);
-	dev_dbg(ndev_dev(ndev), "IBSTERRRCRVSTS0 = %x\n", status32);
+	dev_dbg(dev, "IBSTERRRCRVSTS0 = %x\n", status32);
 	status32 &= ATOM_IBIST_ERR_OFLOW;
 	iowrite32(status32, mmio + ATOM_IBSTERRRCRVSTS0_OFFSET);

 	/* Releases the NTB state machine to allow the link to retrain */
 	status32 = ioread32(mmio + ATOM_LTSSMSTATEJMP_OFFSET);
-	dev_dbg(ndev_dev(ndev), "LTSSMSTATEJMP = %x\n", status32);
+	dev_dbg(dev, "LTSSMSTATEJMP = %x\n", status32);
 	status32 &= ~ATOM_LTSSMSTATEJMP_FORCEDETECT;
 	iowrite32(status32, mmio + ATOM_LTSSMSTATEJMP_OFFSET);

@ -1699,11 +1747,11 @@ static int skx_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	int b2b_bar;
 	u8 bar_sz;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;
 	mmio = ndev->self_mmio;

 	if (ndev->b2b_idx == UINT_MAX) {
-		dev_dbg(ndev_dev(ndev), "not using b2b mw\n");
+		dev_dbg(&pdev->dev, "not using b2b mw\n");
 		b2b_bar = 0;
 		ndev->b2b_off = 0;
 	} else {
@ -1711,24 +1759,21 @@ static int skx_setup_b2b_mw(struct intel_ntb_dev *ndev,
 		if (b2b_bar < 0)
 			return -EIO;

-		dev_dbg(ndev_dev(ndev), "using b2b mw bar %d\n", b2b_bar);
+		dev_dbg(&pdev->dev, "using b2b mw bar %d\n", b2b_bar);

 		bar_size = pci_resource_len(ndev->ntb.pdev, b2b_bar);

-		dev_dbg(ndev_dev(ndev), "b2b bar size %#llx\n", bar_size);
+		dev_dbg(&pdev->dev, "b2b bar size %#llx\n", bar_size);

 		if (b2b_mw_share && ((bar_size >> 1) >= XEON_B2B_MIN_SIZE)) {
-			dev_dbg(ndev_dev(ndev),
-				"b2b using first half of bar\n");
+			dev_dbg(&pdev->dev, "b2b using first half of bar\n");
 			ndev->b2b_off = bar_size >> 1;
 		} else if (bar_size >= XEON_B2B_MIN_SIZE) {
-			dev_dbg(ndev_dev(ndev),
-				"b2b using whole bar\n");
+			dev_dbg(&pdev->dev, "b2b using whole bar\n");
 			ndev->b2b_off = 0;
 			--ndev->mw_count;
 		} else {
-			dev_dbg(ndev_dev(ndev),
-				"b2b bar size is too small\n");
+			dev_dbg(&pdev->dev, "b2b bar size is too small\n");
 			return -EIO;
 		}
 	}
@ -1738,7 +1783,7 @@ static int skx_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	 * except disable or halve the size of the b2b secondary bar.
 	 */
 	pci_read_config_byte(pdev, SKX_IMBAR1SZ_OFFSET, &bar_sz);
-	dev_dbg(ndev_dev(ndev), "IMBAR1SZ %#x\n", bar_sz);
+	dev_dbg(&pdev->dev, "IMBAR1SZ %#x\n", bar_sz);
 	if (b2b_bar == 1) {
 		if (ndev->b2b_off)
 			bar_sz -= 1;
@ -1748,10 +1793,10 @@ static int skx_setup_b2b_mw(struct intel_ntb_dev *ndev,

 	pci_write_config_byte(pdev, SKX_EMBAR1SZ_OFFSET, bar_sz);
 	pci_read_config_byte(pdev, SKX_EMBAR1SZ_OFFSET, &bar_sz);
-	dev_dbg(ndev_dev(ndev), "EMBAR1SZ %#x\n", bar_sz);
+	dev_dbg(&pdev->dev, "EMBAR1SZ %#x\n", bar_sz);

 	pci_read_config_byte(pdev, SKX_IMBAR2SZ_OFFSET, &bar_sz);
-	dev_dbg(ndev_dev(ndev), "IMBAR2SZ %#x\n", bar_sz);
+	dev_dbg(&pdev->dev, "IMBAR2SZ %#x\n", bar_sz);
 	if (b2b_bar == 2) {
 		if (ndev->b2b_off)
 			bar_sz -= 1;
@ -1761,7 +1806,7 @@ static int skx_setup_b2b_mw(struct intel_ntb_dev *ndev,

 	pci_write_config_byte(pdev, SKX_EMBAR2SZ_OFFSET, bar_sz);
 	pci_read_config_byte(pdev, SKX_EMBAR2SZ_OFFSET, &bar_sz);
-	dev_dbg(ndev_dev(ndev), "EMBAR2SZ %#x\n", bar_sz);
+	dev_dbg(&pdev->dev, "EMBAR2SZ %#x\n", bar_sz);

 	/* SBAR01 hit by first part of the b2b bar */
 	if (b2b_bar == 0)
@ -1777,12 +1822,12 @@ static int skx_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	bar_addr = addr->bar2_addr64 + (b2b_bar == 1 ? ndev->b2b_off : 0);
 	iowrite64(bar_addr, mmio + SKX_IMBAR1XLMT_OFFSET);
 	bar_addr = ioread64(mmio + SKX_IMBAR1XLMT_OFFSET);
-	dev_dbg(ndev_dev(ndev), "IMBAR1XLMT %#018llx\n", bar_addr);
+	dev_dbg(&pdev->dev, "IMBAR1XLMT %#018llx\n", bar_addr);

 	bar_addr = addr->bar4_addr64 + (b2b_bar == 2 ? ndev->b2b_off : 0);
 	iowrite64(bar_addr, mmio + SKX_IMBAR2XLMT_OFFSET);
 	bar_addr = ioread64(mmio + SKX_IMBAR2XLMT_OFFSET);
-	dev_dbg(ndev_dev(ndev), "IMBAR2XLMT %#018llx\n", bar_addr);
+	dev_dbg(&pdev->dev, "IMBAR2XLMT %#018llx\n", bar_addr);

 	/* zero incoming translation addrs */
 	iowrite64(0, mmio + SKX_IMBAR1XBASE_OFFSET);
@ -1852,7 +1897,7 @@ static int skx_init_dev(struct intel_ntb_dev *ndev)
 	u8 ppd;
 	int rc;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;

 	ndev->reg = &skx_reg;

@ -1861,7 +1906,7 @@ static int skx_init_dev(struct intel_ntb_dev *ndev)
 		return -EIO;

 	ndev->ntb.topo = xeon_ppd_topo(ndev, ppd);
-	dev_dbg(ndev_dev(ndev), "ppd %#x topo %s\n", ppd,
+	dev_dbg(&pdev->dev, "ppd %#x topo %s\n", ppd,
 		ntb_topo_string(ndev->ntb.topo));
 	if (ndev->ntb.topo == NTB_TOPO_NONE)
 		return -EINVAL;
@ -1885,14 +1930,14 @@ static int intel_ntb3_link_enable(struct ntb_dev *ntb,

 	ndev = container_of(ntb, struct intel_ntb_dev, ntb);

-	dev_dbg(ndev_dev(ndev),
+	dev_dbg(&ntb->pdev->dev,
 		"Enabling link with max_speed %d max_width %d\n",
 		max_speed, max_width);

 	if (max_speed != NTB_SPEED_AUTO)
-		dev_dbg(ndev_dev(ndev), "ignoring max_speed %d\n", max_speed);
+		dev_dbg(&ntb->pdev->dev, "ignoring max_speed %d\n", max_speed);
 	if (max_width != NTB_WIDTH_AUTO)
-		dev_dbg(ndev_dev(ndev), "ignoring max_width %d\n", max_width);
+		dev_dbg(&ntb->pdev->dev, "ignoring max_width %d\n", max_width);

 	ntb_ctl = ioread32(ndev->self_mmio + ndev->reg->ntb_ctl);
 	ntb_ctl &= ~(NTB_CTL_DISABLE | NTB_CTL_CFG_LOCK);
@ -1902,7 +1947,7 @@ static int intel_ntb3_link_enable(struct ntb_dev *ntb,

 	return 0;
 }
-static int intel_ntb3_mw_set_trans(struct ntb_dev *ntb, int idx,
+static int intel_ntb3_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
 				   dma_addr_t addr, resource_size_t size)
 {
 	struct intel_ntb_dev *ndev = ntb_ndev(ntb);
@ -1912,6 +1957,9 @@ static int intel_ntb3_mw_set_trans(struct ntb_dev *ntb, int idx,
 	u64 base, limit, reg_val;
 	int bar;

+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
 	if (idx >= ndev->b2b_idx && !ndev->b2b_off)
 		idx += 1;

@ -1953,7 +2001,7 @@ static int intel_ntb3_mw_set_trans(struct ntb_dev *ntb, int idx,
 		return -EIO;
 	}

-	dev_dbg(ndev_dev(ndev), "BAR %d IMBARXBASE: %#Lx\n", bar, reg_val);
+	dev_dbg(&ntb->pdev->dev, "BAR %d IMBARXBASE: %#Lx\n", bar, reg_val);

 	/* set and verify setting the limit */
 	iowrite64(limit, mmio + limit_reg);
@ -1964,7 +2012,7 @@ static int intel_ntb3_mw_set_trans(struct ntb_dev *ntb, int idx,
 		return -EIO;
 	}

-	dev_dbg(ndev_dev(ndev), "BAR %d IMBARXLMT: %#Lx\n", bar, reg_val);
+	dev_dbg(&ntb->pdev->dev, "BAR %d IMBARXLMT: %#Lx\n", bar, reg_val);

 	/* setup the EP */
 	limit_reg = ndev->xlat_reg->bar2_limit + (idx * 0x10) + 0x4000;
@ -1985,7 +2033,7 @@ static int intel_ntb3_mw_set_trans(struct ntb_dev *ntb, int idx,
 		return -EIO;
 	}

-	dev_dbg(ndev_dev(ndev), "BAR %d EMBARXLMT: %#Lx\n", bar, reg_val);
+	dev_dbg(&ntb->pdev->dev, "BAR %d EMBARXLMT: %#Lx\n", bar, reg_val);

 	return 0;
 }
@ -2092,7 +2140,7 @@ static inline enum ntb_topo xeon_ppd_topo(struct intel_ntb_dev *ndev, u8 ppd)
 static inline int xeon_ppd_bar4_split(struct intel_ntb_dev *ndev, u8 ppd)
 {
 	if (ppd & XEON_PPD_SPLIT_BAR_MASK) {
-		dev_dbg(ndev_dev(ndev), "PPD %d split bar\n", ppd);
+		dev_dbg(&ndev->ntb.pdev->dev, "PPD %d split bar\n", ppd);
 		return 1;
 	}
 	return 0;
@ -2122,11 +2170,11 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	int b2b_bar;
 	u8 bar_sz;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;
 	mmio = ndev->self_mmio;

 	if (ndev->b2b_idx == UINT_MAX) {
-		dev_dbg(ndev_dev(ndev), "not using b2b mw\n");
+		dev_dbg(&pdev->dev, "not using b2b mw\n");
 		b2b_bar = 0;
 		ndev->b2b_off = 0;
 	} else {
@ -2134,24 +2182,21 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 		if (b2b_bar < 0)
 			return -EIO;

-		dev_dbg(ndev_dev(ndev), "using b2b mw bar %d\n", b2b_bar);
+		dev_dbg(&pdev->dev, "using b2b mw bar %d\n", b2b_bar);

 		bar_size = pci_resource_len(ndev->ntb.pdev, b2b_bar);

-		dev_dbg(ndev_dev(ndev), "b2b bar size %#llx\n", bar_size);
+		dev_dbg(&pdev->dev, "b2b bar size %#llx\n", bar_size);

 		if (b2b_mw_share && XEON_B2B_MIN_SIZE <= bar_size >> 1) {
-			dev_dbg(ndev_dev(ndev),
-				"b2b using first half of bar\n");
+			dev_dbg(&pdev->dev, "b2b using first half of bar\n");
 			ndev->b2b_off = bar_size >> 1;
 		} else if (XEON_B2B_MIN_SIZE <= bar_size) {
-			dev_dbg(ndev_dev(ndev),
-				"b2b using whole bar\n");
+			dev_dbg(&pdev->dev, "b2b using whole bar\n");
 			ndev->b2b_off = 0;
 			--ndev->mw_count;
 		} else {
-			dev_dbg(ndev_dev(ndev),
-				"b2b bar size is too small\n");
+			dev_dbg(&pdev->dev, "b2b bar size is too small\n");
 			return -EIO;
 		}
 	}
@ -2163,7 +2208,7 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	 * offsets are not in a consistent order (bar5sz comes after ppd, odd).
 	 */
 	pci_read_config_byte(pdev, XEON_PBAR23SZ_OFFSET, &bar_sz);
-	dev_dbg(ndev_dev(ndev), "PBAR23SZ %#x\n", bar_sz);
+	dev_dbg(&pdev->dev, "PBAR23SZ %#x\n", bar_sz);
 	if (b2b_bar == 2) {
 		if (ndev->b2b_off)
 			bar_sz -= 1;
@ -2172,11 +2217,11 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	}
 	pci_write_config_byte(pdev, XEON_SBAR23SZ_OFFSET, bar_sz);
 	pci_read_config_byte(pdev, XEON_SBAR23SZ_OFFSET, &bar_sz);
-	dev_dbg(ndev_dev(ndev), "SBAR23SZ %#x\n", bar_sz);
+	dev_dbg(&pdev->dev, "SBAR23SZ %#x\n", bar_sz);

 	if (!ndev->bar4_split) {
 		pci_read_config_byte(pdev, XEON_PBAR45SZ_OFFSET, &bar_sz);
-		dev_dbg(ndev_dev(ndev), "PBAR45SZ %#x\n", bar_sz);
+		dev_dbg(&pdev->dev, "PBAR45SZ %#x\n", bar_sz);
 		if (b2b_bar == 4) {
 			if (ndev->b2b_off)
 				bar_sz -= 1;
@ -2185,10 +2230,10 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 		}
 		pci_write_config_byte(pdev, XEON_SBAR45SZ_OFFSET, bar_sz);
 		pci_read_config_byte(pdev, XEON_SBAR45SZ_OFFSET, &bar_sz);
-		dev_dbg(ndev_dev(ndev), "SBAR45SZ %#x\n", bar_sz);
+		dev_dbg(&pdev->dev, "SBAR45SZ %#x\n", bar_sz);
 	} else {
 		pci_read_config_byte(pdev, XEON_PBAR4SZ_OFFSET, &bar_sz);
-		dev_dbg(ndev_dev(ndev), "PBAR4SZ %#x\n", bar_sz);
+		dev_dbg(&pdev->dev, "PBAR4SZ %#x\n", bar_sz);
 		if (b2b_bar == 4) {
 			if (ndev->b2b_off)
 				bar_sz -= 1;
@ -2197,10 +2242,10 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 		}
 		pci_write_config_byte(pdev, XEON_SBAR4SZ_OFFSET, bar_sz);
 		pci_read_config_byte(pdev, XEON_SBAR4SZ_OFFSET, &bar_sz);
-		dev_dbg(ndev_dev(ndev), "SBAR4SZ %#x\n", bar_sz);
+		dev_dbg(&pdev->dev, "SBAR4SZ %#x\n", bar_sz);

 		pci_read_config_byte(pdev, XEON_PBAR5SZ_OFFSET, &bar_sz);
-		dev_dbg(ndev_dev(ndev), "PBAR5SZ %#x\n", bar_sz);
+		dev_dbg(&pdev->dev, "PBAR5SZ %#x\n", bar_sz);
 		if (b2b_bar == 5) {
 			if (ndev->b2b_off)
 				bar_sz -= 1;
@ -2209,7 +2254,7 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 		}
 		pci_write_config_byte(pdev, XEON_SBAR5SZ_OFFSET, bar_sz);
 		pci_read_config_byte(pdev, XEON_SBAR5SZ_OFFSET, &bar_sz);
-		dev_dbg(ndev_dev(ndev), "SBAR5SZ %#x\n", bar_sz);
+		dev_dbg(&pdev->dev, "SBAR5SZ %#x\n", bar_sz);
 	}

 	/* SBAR01 hit by first part of the b2b bar */
@ -2226,7 +2271,7 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	else
 		return -EIO;

-	dev_dbg(ndev_dev(ndev), "SBAR01 %#018llx\n", bar_addr);
+	dev_dbg(&pdev->dev, "SBAR01 %#018llx\n", bar_addr);
 	iowrite64(bar_addr, mmio + XEON_SBAR0BASE_OFFSET);

 	/* Other SBAR are normally hit by the PBAR xlat, except for b2b bar.
@ -2237,26 +2282,26 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	bar_addr = addr->bar2_addr64 + (b2b_bar == 2 ? ndev->b2b_off : 0);
 	iowrite64(bar_addr, mmio + XEON_SBAR23BASE_OFFSET);
 	bar_addr = ioread64(mmio + XEON_SBAR23BASE_OFFSET);
-	dev_dbg(ndev_dev(ndev), "SBAR23 %#018llx\n", bar_addr);
+	dev_dbg(&pdev->dev, "SBAR23 %#018llx\n", bar_addr);

 	if (!ndev->bar4_split) {
 		bar_addr = addr->bar4_addr64 +
 			(b2b_bar == 4 ? ndev->b2b_off : 0);
 		iowrite64(bar_addr, mmio + XEON_SBAR45BASE_OFFSET);
 		bar_addr = ioread64(mmio + XEON_SBAR45BASE_OFFSET);
-		dev_dbg(ndev_dev(ndev), "SBAR45 %#018llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "SBAR45 %#018llx\n", bar_addr);
 	} else {
 		bar_addr = addr->bar4_addr32 +
 			(b2b_bar == 4 ? ndev->b2b_off : 0);
 		iowrite32(bar_addr, mmio + XEON_SBAR4BASE_OFFSET);
 		bar_addr = ioread32(mmio + XEON_SBAR4BASE_OFFSET);
-		dev_dbg(ndev_dev(ndev), "SBAR4 %#010llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "SBAR4 %#010llx\n", bar_addr);

 		bar_addr = addr->bar5_addr32 +
 			(b2b_bar == 5 ? ndev->b2b_off : 0);
 		iowrite32(bar_addr, mmio + XEON_SBAR5BASE_OFFSET);
 		bar_addr = ioread32(mmio + XEON_SBAR5BASE_OFFSET);
-		dev_dbg(ndev_dev(ndev), "SBAR5 %#010llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "SBAR5 %#010llx\n", bar_addr);
 	}

 	/* setup incoming bar limits == base addrs (zero length windows) */
@ -2264,26 +2309,26 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	bar_addr = addr->bar2_addr64 + (b2b_bar == 2 ? ndev->b2b_off : 0);
 	iowrite64(bar_addr, mmio + XEON_SBAR23LMT_OFFSET);
 	bar_addr = ioread64(mmio + XEON_SBAR23LMT_OFFSET);
-	dev_dbg(ndev_dev(ndev), "SBAR23LMT %#018llx\n", bar_addr);
+	dev_dbg(&pdev->dev, "SBAR23LMT %#018llx\n", bar_addr);

 	if (!ndev->bar4_split) {
 		bar_addr = addr->bar4_addr64 +
 			(b2b_bar == 4 ? ndev->b2b_off : 0);
 		iowrite64(bar_addr, mmio + XEON_SBAR45LMT_OFFSET);
 		bar_addr = ioread64(mmio + XEON_SBAR45LMT_OFFSET);
-		dev_dbg(ndev_dev(ndev), "SBAR45LMT %#018llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "SBAR45LMT %#018llx\n", bar_addr);
 	} else {
 		bar_addr = addr->bar4_addr32 +
 			(b2b_bar == 4 ? ndev->b2b_off : 0);
 		iowrite32(bar_addr, mmio + XEON_SBAR4LMT_OFFSET);
 		bar_addr = ioread32(mmio + XEON_SBAR4LMT_OFFSET);
-		dev_dbg(ndev_dev(ndev), "SBAR4LMT %#010llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "SBAR4LMT %#010llx\n", bar_addr);

 		bar_addr = addr->bar5_addr32 +
 			(b2b_bar == 5 ? ndev->b2b_off : 0);
 		iowrite32(bar_addr, mmio + XEON_SBAR5LMT_OFFSET);
 		bar_addr = ioread32(mmio + XEON_SBAR5LMT_OFFSET);
-		dev_dbg(ndev_dev(ndev), "SBAR5LMT %#05llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "SBAR5LMT %#05llx\n", bar_addr);
 	}

 	/* zero incoming translation addrs */
@ -2309,23 +2354,23 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 	bar_addr = peer_addr->bar2_addr64;
 	iowrite64(bar_addr, mmio + XEON_PBAR23XLAT_OFFSET);
 	bar_addr = ioread64(mmio + XEON_PBAR23XLAT_OFFSET);
-	dev_dbg(ndev_dev(ndev), "PBAR23XLAT %#018llx\n", bar_addr);
+	dev_dbg(&pdev->dev, "PBAR23XLAT %#018llx\n", bar_addr);

 	if (!ndev->bar4_split) {
 		bar_addr = peer_addr->bar4_addr64;
 		iowrite64(bar_addr, mmio + XEON_PBAR45XLAT_OFFSET);
 		bar_addr = ioread64(mmio + XEON_PBAR45XLAT_OFFSET);
-		dev_dbg(ndev_dev(ndev), "PBAR45XLAT %#018llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "PBAR45XLAT %#018llx\n", bar_addr);
 	} else {
 		bar_addr = peer_addr->bar4_addr32;
 		iowrite32(bar_addr, mmio + XEON_PBAR4XLAT_OFFSET);
 		bar_addr = ioread32(mmio + XEON_PBAR4XLAT_OFFSET);
-		dev_dbg(ndev_dev(ndev), "PBAR4XLAT %#010llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "PBAR4XLAT %#010llx\n", bar_addr);

 		bar_addr = peer_addr->bar5_addr32;
 		iowrite32(bar_addr, mmio + XEON_PBAR5XLAT_OFFSET);
 		bar_addr = ioread32(mmio + XEON_PBAR5XLAT_OFFSET);
-		dev_dbg(ndev_dev(ndev), "PBAR5XLAT %#010llx\n", bar_addr);
+		dev_dbg(&pdev->dev, "PBAR5XLAT %#010llx\n", bar_addr);
 	}

 	/* set the translation offset for b2b registers */
@ -2343,7 +2388,7 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,
 		return -EIO;

 	/* B2B_XLAT_OFFSET is 64bit, but can only take 32bit writes */
-	dev_dbg(ndev_dev(ndev), "B2BXLAT %#018llx\n", bar_addr);
+	dev_dbg(&pdev->dev, "B2BXLAT %#018llx\n", bar_addr);
 	iowrite32(bar_addr, mmio + XEON_B2B_XLAT_OFFSETL);
 	iowrite32(bar_addr >> 32, mmio + XEON_B2B_XLAT_OFFSETU);

@ -2362,6 +2407,7 @@ static int xeon_setup_b2b_mw(struct intel_ntb_dev *ndev,

 static int xeon_init_ntb(struct intel_ntb_dev *ndev)
 {
+	struct device *dev = &ndev->ntb.pdev->dev;
 	int rc;
 	u32 ntb_ctl;

@ -2377,7 +2423,7 @@ static int xeon_init_ntb(struct intel_ntb_dev *ndev)
 	switch (ndev->ntb.topo) {
 	case NTB_TOPO_PRI:
 		if (ndev->hwerr_flags & NTB_HWERR_SDOORBELL_LOCKUP) {
-			dev_err(ndev_dev(ndev), "NTB Primary config disabled\n");
+			dev_err(dev, "NTB Primary config disabled\n");
 			return -EINVAL;
 		}

@ -2395,7 +2441,7 @@ static int xeon_init_ntb(struct intel_ntb_dev *ndev)

 	case NTB_TOPO_SEC:
 		if (ndev->hwerr_flags & NTB_HWERR_SDOORBELL_LOCKUP) {
-			dev_err(ndev_dev(ndev), "NTB Secondary config disabled\n");
+			dev_err(dev, "NTB Secondary config disabled\n");
 			return -EINVAL;
 		}
 		/* use half the spads for the peer */
@ -2420,18 +2466,17 @@ static int xeon_init_ntb(struct intel_ntb_dev *ndev)
 				ndev->b2b_idx = b2b_mw_idx;

 			if (ndev->b2b_idx >= ndev->mw_count) {
-				dev_dbg(ndev_dev(ndev),
+				dev_dbg(dev,
 					"b2b_mw_idx %d invalid for mw_count %u\n",
 					b2b_mw_idx, ndev->mw_count);
 				return -EINVAL;
 			}

-			dev_dbg(ndev_dev(ndev),
-				"setting up b2b mw idx %d means %d\n",
+			dev_dbg(dev, "setting up b2b mw idx %d means %d\n",
 				b2b_mw_idx, ndev->b2b_idx);

 		} else if (ndev->hwerr_flags & NTB_HWERR_B2BDOORBELL_BIT14) {
-			dev_warn(ndev_dev(ndev), "Reduce doorbell count by 1\n");
+			dev_warn(dev, "Reduce doorbell count by 1\n");
 			ndev->db_count -= 1;
 		}

@ -2472,7 +2517,7 @@ static int xeon_init_dev(struct intel_ntb_dev *ndev)
 	u8 ppd;
 	int rc, mem;

-	pdev = ndev_pdev(ndev);
+	pdev = ndev->ntb.pdev;

 	switch (pdev->device) {
 	/* There is a Xeon hardware errata related to writes to SDOORBELL or
@ -2548,14 +2593,14 @@ static int xeon_init_dev(struct intel_ntb_dev *ndev)
 		return -EIO;

 	ndev->ntb.topo = xeon_ppd_topo(ndev, ppd);
-	dev_dbg(ndev_dev(ndev), "ppd %#x topo %s\n", ppd,
+	dev_dbg(&pdev->dev, "ppd %#x topo %s\n", ppd,
 		ntb_topo_string(ndev->ntb.topo));
 	if (ndev->ntb.topo == NTB_TOPO_NONE)
 		return -EINVAL;

 	if (ndev->ntb.topo != NTB_TOPO_SEC) {
 		ndev->bar4_split = xeon_ppd_bar4_split(ndev, ppd);
-		dev_dbg(ndev_dev(ndev), "ppd %#x bar4_split %d\n",
+		dev_dbg(&pdev->dev, "ppd %#x bar4_split %d\n",
 			ppd, ndev->bar4_split);
 	} else {
 		/* This is a way for transparent BAR to figure out if we are
@ -2565,7 +2610,7 @@ static int xeon_init_dev(struct intel_ntb_dev *ndev)
 		mem = pci_select_bars(pdev, IORESOURCE_MEM);
 		ndev->bar4_split = hweight32(mem) ==
 			HSX_SPLIT_BAR_MW_COUNT + 1;
-		dev_dbg(ndev_dev(ndev), "mem %#x bar4_split %d\n",
+		dev_dbg(&pdev->dev, "mem %#x bar4_split %d\n",
 			mem, ndev->bar4_split);
 	}

@ -2602,7 +2647,7 @@ static int intel_ntb_init_pci(struct intel_ntb_dev *ndev, struct pci_dev *pdev)
 		rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (rc)
 			goto err_dma_mask;
-		dev_warn(ndev_dev(ndev), "Cannot DMA highmem\n");
+		dev_warn(&pdev->dev, "Cannot DMA highmem\n");
 	}

 	rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
@ -2610,7 +2655,7 @@ static int intel_ntb_init_pci(struct intel_ntb_dev *ndev, struct pci_dev *pdev)
 		rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (rc)
 			goto err_dma_mask;
-		dev_warn(ndev_dev(ndev), "Cannot DMA consistent highmem\n");
+		dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
 	}

 	ndev->self_mmio = pci_iomap(pdev, 0, 0);
@ -2636,7 +2681,7 @@ err_pci_enable:

 static void intel_ntb_deinit_pci(struct intel_ntb_dev *ndev)
 {
-	struct pci_dev *pdev = ndev_pdev(ndev);
+	struct pci_dev *pdev = ndev->ntb.pdev;

 	if (ndev->peer_mmio && ndev->peer_mmio != ndev->self_mmio)
 		pci_iounmap(pdev, ndev->peer_mmio);
@ -2906,8 +2951,10 @@ static const struct intel_ntb_xlat_reg skx_sec_xlat = {
 /* operations for primary side of local ntb */
 static const struct ntb_dev_ops intel_ntb_ops = {
 	.mw_count		= intel_ntb_mw_count,
-	.mw_get_range		= intel_ntb_mw_get_range,
+	.mw_get_align		= intel_ntb_mw_get_align,
 	.mw_set_trans		= intel_ntb_mw_set_trans,
+	.peer_mw_count		= intel_ntb_peer_mw_count,
+	.peer_mw_get_addr	= intel_ntb_peer_mw_get_addr,
 	.link_is_up		= intel_ntb_link_is_up,
 	.link_enable		= intel_ntb_link_enable,
 	.link_disable		= intel_ntb_link_disable,
@ -2932,8 +2979,10 @@ static const struct ntb_dev_ops intel_ntb_ops = {

 static const struct ntb_dev_ops intel_ntb3_ops = {
 	.mw_count		= intel_ntb_mw_count,
-	.mw_get_range		= intel_ntb_mw_get_range,
+	.mw_get_align		= intel_ntb_mw_get_align,
 	.mw_set_trans		= intel_ntb3_mw_set_trans,
+	.peer_mw_count		= intel_ntb_peer_mw_count,
+	.peer_mw_get_addr	= intel_ntb_peer_mw_get_addr,
 	.link_is_up		= intel_ntb_link_is_up,
 	.link_enable		= intel_ntb3_link_enable,
 	.link_disable		= intel_ntb_link_disable,
@ -3008,4 +3057,3 @@ static void __exit intel_ntb_pci_driver_exit(void)
 	debugfs_remove_recursive(debugfs_dir);
 }
 module_exit(intel_ntb_pci_driver_exit);
-
--- a/drivers/ntb/hw/intel/ntb_hw_intel.h
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.h
@ -382,9 +382,6 @@ struct intel_ntb_dev {
 	struct dentry			*debugfs_info;
 };

-#define ndev_pdev(ndev) ((ndev)->ntb.pdev)
-#define ndev_name(ndev) pci_name(ndev_pdev(ndev))
-#define ndev_dev(ndev) (&ndev_pdev(ndev)->dev)
 #define ntb_ndev(__ntb) container_of(__ntb, struct intel_ntb_dev, ntb)
 #define hb_ndev(__work) container_of(__work, struct intel_ntb_dev, \
 				     hb_timer.work)
--- a/drivers/ntb/ntb.c
+++ b/drivers/ntb/ntb.c
@ -5,6 +5,7 @@
 *   GPL LICENSE SUMMARY
 *
 *   Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of version 2 of the GNU General Public License as
@ -18,6 +19,7 @@
 *   BSD LICENSE
 *
 *   Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
@ -191,6 +193,73 @@ void ntb_db_event(struct ntb_dev *ntb, int vector)
 }
 EXPORT_SYMBOL(ntb_db_event);

+void ntb_msg_event(struct ntb_dev *ntb)
+{
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&ntb->ctx_lock, irqflags);
+	{
+		if (ntb->ctx_ops && ntb->ctx_ops->msg_event)
+			ntb->ctx_ops->msg_event(ntb->ctx);
+	}
+	spin_unlock_irqrestore(&ntb->ctx_lock, irqflags);
+}
+EXPORT_SYMBOL(ntb_msg_event);
+
+int ntb_default_port_number(struct ntb_dev *ntb)
+{
+	switch (ntb->topo) {
+	case NTB_TOPO_PRI:
+	case NTB_TOPO_B2B_USD:
+		return NTB_PORT_PRI_USD;
+	case NTB_TOPO_SEC:
+	case NTB_TOPO_B2B_DSD:
+		return NTB_PORT_SEC_DSD;
+	default:
+		break;
+	}
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(ntb_default_port_number);
+
+int ntb_default_peer_port_count(struct ntb_dev *ntb)
+{
+	return NTB_DEF_PEER_CNT;
+}
+EXPORT_SYMBOL(ntb_default_peer_port_count);
+
+int ntb_default_peer_port_number(struct ntb_dev *ntb, int pidx)
+{
+	if (pidx != NTB_DEF_PEER_IDX)
+		return -EINVAL;
+
+	switch (ntb->topo) {
+	case NTB_TOPO_PRI:
+	case NTB_TOPO_B2B_USD:
+		return NTB_PORT_SEC_DSD;
+	case NTB_TOPO_SEC:
+	case NTB_TOPO_B2B_DSD:
+		return NTB_PORT_PRI_USD;
+	default:
+		break;
+	}
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(ntb_default_peer_port_number);
+
+int ntb_default_peer_port_idx(struct ntb_dev *ntb, int port)
+{
+	int peer_port = ntb_default_peer_port_number(ntb, NTB_DEF_PEER_IDX);
+
+	if (peer_port == -EINVAL || port != peer_port)
+		return -EINVAL;
+
+	return 0;
+}
+EXPORT_SYMBOL(ntb_default_peer_port_idx);
+
 static int ntb_probe(struct device *dev)
 {
 	struct ntb_dev *ntb;
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@ -95,6 +95,9 @@ MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");

 static struct dentry *nt_debugfs_dir;

+/* Only two-ports NTB devices are supported */
+#define PIDX		NTB_DEF_PEER_IDX
+
 struct ntb_queue_entry {
 	/* ntb_queue list reference */
 	struct list_head entry;
@ -670,7 +673,7 @@ static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
 	if (!mw->virt_addr)
 		return;

-	ntb_mw_clear_trans(nt->ndev, num_mw);
+	ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
 	dma_free_coherent(&pdev->dev, mw->buff_size,
 			  mw->virt_addr, mw->dma_addr);
 	mw->xlat_size = 0;
@ -727,7 +730,8 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
 	}

 	/* Notify HW the memory location of the receive buffer */
-	rc = ntb_mw_set_trans(nt->ndev, num_mw, mw->dma_addr, mw->xlat_size);
+	rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
+			      mw->xlat_size);
 	if (rc) {
 		dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
 		ntb_free_mw(nt, num_mw);
@ -858,17 +862,17 @@ static void ntb_transport_link_work(struct work_struct *work)
 			size = max_mw_size;

 		spad = MW0_SZ_HIGH + (i * 2);
-		ntb_peer_spad_write(ndev, spad, upper_32_bits(size));
+		ntb_peer_spad_write(ndev, PIDX, spad, upper_32_bits(size));

 		spad = MW0_SZ_LOW + (i * 2);
-		ntb_peer_spad_write(ndev, spad, lower_32_bits(size));
+		ntb_peer_spad_write(ndev, PIDX, spad, lower_32_bits(size));
 	}

-	ntb_peer_spad_write(ndev, NUM_MWS, nt->mw_count);
+	ntb_peer_spad_write(ndev, PIDX, NUM_MWS, nt->mw_count);

-	ntb_peer_spad_write(ndev, NUM_QPS, nt->qp_count);
+	ntb_peer_spad_write(ndev, PIDX, NUM_QPS, nt->qp_count);

-	ntb_peer_spad_write(ndev, VERSION, NTB_TRANSPORT_VERSION);
+	ntb_peer_spad_write(ndev, PIDX, VERSION, NTB_TRANSPORT_VERSION);

 	/* Query the remote side for its info */
 	val = ntb_spad_read(ndev, VERSION);
@ -944,7 +948,7 @@ static void ntb_qp_link_work(struct work_struct *work)

 	val = ntb_spad_read(nt->ndev, QP_LINKS);

-	ntb_peer_spad_write(nt->ndev, QP_LINKS, val | BIT(qp->qp_num));
+	ntb_peer_spad_write(nt->ndev, PIDX, QP_LINKS, val | BIT(qp->qp_num));

 	/* query remote spad for qp ready bits */
 	dev_dbg_ratelimited(&pdev->dev, "Remote QP link status = %x\n", val);
@ -1055,7 +1059,12 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
 	int node;
 	int rc, i;

-	mw_count = ntb_mw_count(ndev);
+	mw_count = ntb_mw_count(ndev, PIDX);
+
+	if (!ndev->ops->mw_set_trans) {
+		dev_err(&ndev->dev, "Inbound MW based NTB API is required\n");
+		return -EINVAL;
+	}

 	if (ntb_db_is_unsafe(ndev))
 		dev_dbg(&ndev->dev,
@ -1064,6 +1073,9 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
 		dev_dbg(&ndev->dev,
 			"scratchpad is unsafe, proceed anyway...\n");

+	if (ntb_peer_port_count(ndev) != NTB_DEF_PEER_CNT)
+		dev_warn(&ndev->dev, "Multi-port NTB devices unsupported\n");
+
 	node = dev_to_node(&ndev->dev);

 	nt = kzalloc_node(sizeof(*nt), GFP_KERNEL, node);
@ -1094,8 +1106,13 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
 	for (i = 0; i < mw_count; i++) {
 		mw = &nt->mw_vec[i];

-		rc = ntb_mw_get_range(ndev, i, &mw->phys_addr, &mw->phys_size,
-				      &mw->xlat_align, &mw->xlat_align_size);
+		rc = ntb_mw_get_align(ndev, PIDX, i, &mw->xlat_align,
+				      &mw->xlat_align_size, NULL);
+		if (rc)
+			goto err1;
+
+		rc = ntb_peer_mw_get_addr(ndev, i, &mw->phys_addr,
+					  &mw->phys_size);
 		if (rc)
 			goto err1;

@ -2091,8 +2108,7 @@ void ntb_transport_link_down(struct ntb_transport_qp *qp)

 	val = ntb_spad_read(qp->ndev, QP_LINKS);

-	ntb_peer_spad_write(qp->ndev, QP_LINKS,
-			    val & ~BIT(qp->qp_num));
+	ntb_peer_spad_write(qp->ndev, PIDX, QP_LINKS, val & ~BIT(qp->qp_num));

 	if (qp->link_is_up)
 		ntb_send_link_down(qp);
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@ -76,6 +76,7 @@
 #define DMA_RETRIES		20
 #define SZ_4G			(1ULL << 32)
 #define MAX_SEG_ORDER		20 /* no larger than 1M for kmalloc buffer */
+#define PIDX			NTB_DEF_PEER_IDX

 MODULE_LICENSE(DRIVER_LICENSE);
 MODULE_VERSION(DRIVER_VERSION);
@ -100,6 +101,10 @@ static bool use_dma; /* default to 0 */
 module_param(use_dma, bool, 0644);
 MODULE_PARM_DESC(use_dma, "Using DMA engine to measure performance");

+static bool on_node = true; /* default to 1 */
+module_param(on_node, bool, 0644);
+MODULE_PARM_DESC(on_node, "Run threads only on NTB device node (default: true)");
+
 struct perf_mw {
 	phys_addr_t	phys_addr;
 	resource_size_t	phys_size;
@ -135,9 +140,6 @@ struct perf_ctx {
 	bool			link_is_up;
 	struct delayed_work	link_work;
 	wait_queue_head_t	link_wq;
-	struct dentry		*debugfs_node_dir;
-	struct dentry		*debugfs_run;
-	struct dentry		*debugfs_threads;
 	u8			perf_threads;
 	/* mutex ensures only one set of threads run at once */
 	struct mutex		run_mutex;
@ -344,6 +346,10 @@ static int perf_move_data(struct pthr_ctx *pctx, char __iomem *dst, char *src,

 static bool perf_dma_filter_fn(struct dma_chan *chan, void *node)
 {
+	/* Is the channel required to be on the same node as the device? */
+	if (!on_node)
+		return true;
+
 	return dev_to_node(&chan->dev->device) == (int)(unsigned long)node;
 }

@ -361,7 +367,7 @@ static int ntb_perf_thread(void *data)

 	pr_debug("kthread %s starting...\n", current->comm);

-	node = dev_to_node(&pdev->dev);
+	node = on_node ? dev_to_node(&pdev->dev) : NUMA_NO_NODE;

 	if (use_dma && !pctx->dma_chan) {
 		dma_cap_mask_t dma_mask;
@ -454,7 +460,7 @@ static void perf_free_mw(struct perf_ctx *perf)
 	if (!mw->virt_addr)
 		return;

-	ntb_mw_clear_trans(perf->ntb, 0);
+	ntb_mw_clear_trans(perf->ntb, PIDX, 0);
 	dma_free_coherent(&pdev->dev, mw->buf_size,
 			  mw->virt_addr, mw->dma_addr);
 	mw->xlat_size = 0;
@ -490,7 +496,7 @@ static int perf_set_mw(struct perf_ctx *perf, resource_size_t size)
 		mw->buf_size = 0;
 	}

-	rc = ntb_mw_set_trans(perf->ntb, 0, mw->dma_addr, mw->xlat_size);
+	rc = ntb_mw_set_trans(perf->ntb, PIDX, 0, mw->dma_addr, mw->xlat_size);
 	if (rc) {
 		dev_err(&perf->ntb->dev, "Unable to set mw0 translation\n");
 		perf_free_mw(perf);
@ -517,9 +523,9 @@ static void perf_link_work(struct work_struct *work)
 	if (max_mw_size && size > max_mw_size)
 		size = max_mw_size;

-	ntb_peer_spad_write(ndev, MW_SZ_HIGH, upper_32_bits(size));
-	ntb_peer_spad_write(ndev, MW_SZ_LOW, lower_32_bits(size));
-	ntb_peer_spad_write(ndev, VERSION, PERF_VERSION);
+	ntb_peer_spad_write(ndev, PIDX, MW_SZ_HIGH, upper_32_bits(size));
+	ntb_peer_spad_write(ndev, PIDX, MW_SZ_LOW, lower_32_bits(size));
+	ntb_peer_spad_write(ndev, PIDX, VERSION, PERF_VERSION);

 	/* now read what peer wrote */
 	val = ntb_spad_read(ndev, VERSION);
@ -561,8 +567,12 @@ static int perf_setup_mw(struct ntb_dev *ntb, struct perf_ctx *perf)

 	mw = &perf->mw;

-	rc = ntb_mw_get_range(ntb, 0, &mw->phys_addr, &mw->phys_size,
-			      &mw->xlat_align, &mw->xlat_align_size);
+	rc = ntb_mw_get_align(ntb, PIDX, 0, &mw->xlat_align,
+			      &mw->xlat_align_size, NULL);
+	if (rc)
+		return rc;
+
+	rc = ntb_peer_mw_get_addr(ntb, 0, &mw->phys_addr, &mw->phys_size);
 	if (rc)
 		return rc;

@ -677,7 +687,8 @@ static ssize_t debugfs_run_write(struct file *filp, const char __user *ubuf,
 		pr_info("Fix run_order to %u\n", run_order);
 	}

-	node = dev_to_node(&perf->ntb->pdev->dev);
+	node = on_node ? dev_to_node(&perf->ntb->pdev->dev)
+		       : NUMA_NO_NODE;
 	atomic_set(&perf->tdone, 0);

 	/* launch kernel thread */
@ -723,34 +734,71 @@ static const struct file_operations ntb_perf_debugfs_run = {
 static int perf_debugfs_setup(struct perf_ctx *perf)
 {
 	struct pci_dev *pdev = perf->ntb->pdev;
+	struct dentry *debugfs_node_dir;
+	struct dentry *debugfs_run;
+	struct dentry *debugfs_threads;
+	struct dentry *debugfs_seg_order;
+	struct dentry *debugfs_run_order;
+	struct dentry *debugfs_use_dma;
+	struct dentry *debugfs_on_node;

 	if (!debugfs_initialized())
 		return -ENODEV;

+	/* Assumpion: only one NTB device in the system */
 	if (!perf_debugfs_dir) {
 		perf_debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
 		if (!perf_debugfs_dir)
 			return -ENODEV;
 	}

-	perf->debugfs_node_dir = debugfs_create_dir(pci_name(pdev),
-						    perf_debugfs_dir);
-	if (!perf->debugfs_node_dir)
-		return -ENODEV;
+	debugfs_node_dir = debugfs_create_dir(pci_name(pdev),
+					      perf_debugfs_dir);
+	if (!debugfs_node_dir)
+		goto err;

-	perf->debugfs_run = debugfs_create_file("run", S_IRUSR | S_IWUSR,
-						perf->debugfs_node_dir, perf,
-						&ntb_perf_debugfs_run);
-	if (!perf->debugfs_run)
-		return -ENODEV;
+	debugfs_run = debugfs_create_file("run", S_IRUSR | S_IWUSR,
+					  debugfs_node_dir, perf,
+					  &ntb_perf_debugfs_run);
+	if (!debugfs_run)
+		goto err;

-	perf->debugfs_threads = debugfs_create_u8("threads", S_IRUSR | S_IWUSR,
-						  perf->debugfs_node_dir,
-						  &perf->perf_threads);
-	if (!perf->debugfs_threads)
-		return -ENODEV;
+	debugfs_threads = debugfs_create_u8("threads", S_IRUSR | S_IWUSR,
+					    debugfs_node_dir,
+					    &perf->perf_threads);
+	if (!debugfs_threads)
+		goto err;
+
+	debugfs_seg_order = debugfs_create_u32("seg_order", 0600,
+					       debugfs_node_dir,
+					       &seg_order);
+	if (!debugfs_seg_order)
+		goto err;
+
+	debugfs_run_order = debugfs_create_u32("run_order", 0600,
+					       debugfs_node_dir,
+					       &run_order);
+	if (!debugfs_run_order)
+		goto err;
+
+	debugfs_use_dma = debugfs_create_bool("use_dma", 0600,
+					       debugfs_node_dir,
+					       &use_dma);
+	if (!debugfs_use_dma)
+		goto err;
+
+	debugfs_on_node = debugfs_create_bool("on_node", 0600,
+					      debugfs_node_dir,
+					      &on_node);
+	if (!debugfs_on_node)
+		goto err;

 	return 0;
+
+err:
+	debugfs_remove_recursive(perf_debugfs_dir);
+	perf_debugfs_dir = NULL;
+	return -ENODEV;
 }

 static int perf_probe(struct ntb_client *client, struct ntb_dev *ntb)
@ -766,8 +814,15 @@ static int perf_probe(struct ntb_client *client, struct ntb_dev *ntb)
 		return -EIO;
 	}

-	node = dev_to_node(&pdev->dev);
+	if (!ntb->ops->mw_set_trans) {
+		dev_err(&ntb->dev, "Need inbound MW based NTB API\n");
+		return -EINVAL;
+	}

+	if (ntb_peer_port_count(ntb) != NTB_DEF_PEER_CNT)
+		dev_warn(&ntb->dev, "Multi-port NTB devices unsupported\n");
+
+	node = on_node ? dev_to_node(&pdev->dev) : NUMA_NO_NODE;
 	perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, node);
 	if (!perf) {
 		rc = -ENOMEM;
--- a/drivers/ntb/test/ntb_pingpong.c
+++ b/drivers/ntb/test/ntb_pingpong.c
@ -90,6 +90,9 @@ static unsigned long db_init = 0x7;
 module_param(db_init, ulong, 0644);
 MODULE_PARM_DESC(db_init, "Initial doorbell bits to ring on the peer");

+/* Only two-ports NTB devices are supported */
+#define PIDX		NTB_DEF_PEER_IDX
+
 struct pp_ctx {
 	struct ntb_dev			*ntb;
 	u64				db_bits;
@ -135,7 +138,7 @@ static void pp_ping(unsigned long ctx)
 			"Ping bits %#llx read %#x write %#x\n",
 			db_bits, spad_rd, spad_wr);

-		ntb_peer_spad_write(pp->ntb, 0, spad_wr);
+		ntb_peer_spad_write(pp->ntb, PIDX, 0, spad_wr);
 		ntb_peer_db_set(pp->ntb, db_bits);
 		ntb_db_clear_mask(pp->ntb, db_mask);

@ -222,6 +225,12 @@ static int pp_probe(struct ntb_client *client,
 		}
 	}

+	if (ntb_spad_count(ntb) < 1) {
+		dev_dbg(&ntb->dev, "no enough scratchpads\n");
+		rc = -EINVAL;
+		goto err_pp;
+	}
+
 	if (ntb_spad_is_unsafe(ntb)) {
 		dev_dbg(&ntb->dev, "scratchpad is unsafe\n");
 		if (!unsafe) {
@ -230,6 +239,9 @@ static int pp_probe(struct ntb_client *client,
 		}
 	}

+	if (ntb_peer_port_count(ntb) != NTB_DEF_PEER_CNT)
+		dev_warn(&ntb->dev, "multi-port NTB is unsupported\n");
+
 	pp = kmalloc(sizeof(*pp), GFP_KERNEL);
 	if (!pp) {
 		rc = -ENOMEM;
--- a/drivers/ntb/test/ntb_tool.c
+++ b/drivers/ntb/test/ntb_tool.c
@ -119,7 +119,10 @@ MODULE_VERSION(DRIVER_VERSION);
 MODULE_AUTHOR(DRIVER_AUTHOR);
 MODULE_DESCRIPTION(DRIVER_DESCRIPTION);

-#define MAX_MWS 16
+/* It is rare to have hadrware with greater than six MWs */
+#define MAX_MWS	6
+/* Only two-ports devices are supported */
+#define PIDX	NTB_DEF_PEER_IDX

 static struct dentry *tool_dbgfs;

@ -459,13 +462,22 @@ static TOOL_FOPS_RDWR(tool_spad_fops,
 		      tool_spad_read,
 		      tool_spad_write);

+static u32 ntb_tool_peer_spad_read(struct ntb_dev *ntb, int sidx)
+{
+	return ntb_peer_spad_read(ntb, PIDX, sidx);
+}
+
 static ssize_t tool_peer_spad_read(struct file *filep, char __user *ubuf,
 				   size_t size, loff_t *offp)
 {
 	struct tool_ctx *tc = filep->private_data;

-	return tool_spadfn_read(tc, ubuf, size, offp,
-				tc->ntb->ops->peer_spad_read);
+	return tool_spadfn_read(tc, ubuf, size, offp, ntb_tool_peer_spad_read);
+}
+
+static int ntb_tool_peer_spad_write(struct ntb_dev *ntb, int sidx, u32 val)
+{
+	return ntb_peer_spad_write(ntb, PIDX, sidx, val);
 }

 static ssize_t tool_peer_spad_write(struct file *filep, const char __user *ubuf,
@ -474,7 +486,7 @@ static ssize_t tool_peer_spad_write(struct file *filep, const char __user *ubuf,
 	struct tool_ctx *tc = filep->private_data;

 	return tool_spadfn_write(tc, ubuf, size, offp,
-				 tc->ntb->ops->peer_spad_write);
+				 ntb_tool_peer_spad_write);
 }

 static TOOL_FOPS_RDWR(tool_peer_spad_fops,
@ -668,28 +680,27 @@ static int tool_setup_mw(struct tool_ctx *tc, int idx, size_t req_size)
 {
 	int rc;
 	struct tool_mw *mw = &tc->mws[idx];
-	phys_addr_t base;
-	resource_size_t size, align, align_size;
+	resource_size_t size, align_addr, align_size;
 	char buf[16];

 	if (mw->peer)
 		return 0;

-	rc = ntb_mw_get_range(tc->ntb, idx, &base, &size, &align,
-			      &align_size);
+	rc = ntb_mw_get_align(tc->ntb, PIDX, idx, &align_addr,
+				&align_size, &size);
 	if (rc)
 		return rc;

 	mw->size = min_t(resource_size_t, req_size, size);
-	mw->size = round_up(mw->size, align);
+	mw->size = round_up(mw->size, align_addr);
 	mw->size = round_up(mw->size, align_size);
 	mw->peer = dma_alloc_coherent(&tc->ntb->pdev->dev, mw->size,
 				      &mw->peer_dma, GFP_KERNEL);

-	if (!mw->peer)
+	if (!mw->peer || !IS_ALIGNED(mw->peer_dma, align_addr))
 		return -ENOMEM;

-	rc = ntb_mw_set_trans(tc->ntb, idx, mw->peer_dma, mw->size);
+	rc = ntb_mw_set_trans(tc->ntb, PIDX, idx, mw->peer_dma, mw->size);
 	if (rc)
 		goto err_free_dma;

@ -716,7 +727,7 @@ static void tool_free_mw(struct tool_ctx *tc, int idx)
 	struct tool_mw *mw = &tc->mws[idx];

 	if (mw->peer) {
-		ntb_mw_clear_trans(tc->ntb, idx);
+		ntb_mw_clear_trans(tc->ntb, PIDX, idx);
 		dma_free_coherent(&tc->ntb->pdev->dev, mw->size,
 				  mw->peer,
 				  mw->peer_dma);
@ -742,8 +753,9 @@ static ssize_t tool_peer_mw_trans_read(struct file *filep,

 	phys_addr_t base;
 	resource_size_t mw_size;
-	resource_size_t align;
+	resource_size_t align_addr;
 	resource_size_t align_size;
+	resource_size_t max_size;

 	buf_size = min_t(size_t, size, 512);

@ -751,8 +763,9 @@ static ssize_t tool_peer_mw_trans_read(struct file *filep,
 	if (!buf)
 		return -ENOMEM;

-	ntb_mw_get_range(mw->tc->ntb, mw->idx,
-			 &base, &mw_size, &align, &align_size);
+	ntb_mw_get_align(mw->tc->ntb, PIDX, mw->idx,
+			 &align_addr, &align_size, &max_size);
+	ntb_peer_mw_get_addr(mw->tc->ntb, mw->idx, &base, &mw_size);

 	off += scnprintf(buf + off, buf_size - off,
 			 "Peer MW %d Information:\n", mw->idx);
@ -767,12 +780,16 @@ static ssize_t tool_peer_mw_trans_read(struct file *filep,

 	off += scnprintf(buf + off, buf_size - off,
 			 "Alignment             \t%lld\n",
-			 (unsigned long long)align);
+			 (unsigned long long)align_addr);

 	off += scnprintf(buf + off, buf_size - off,
 			 "Size Alignment        \t%lld\n",
 			 (unsigned long long)align_size);

+	off += scnprintf(buf + off, buf_size - off,
+			 "Size Max              \t%lld\n",
+			 (unsigned long long)max_size);
+
 	off += scnprintf(buf + off, buf_size - off,
 			 "Ready                 \t%c\n",
 			 (mw->peer) ? 'Y' : 'N');
@ -827,8 +844,7 @@ static int tool_init_mw(struct tool_ctx *tc, int idx)
 	phys_addr_t base;
 	int rc;

-	rc = ntb_mw_get_range(tc->ntb, idx, &base, &mw->win_size,
-			      NULL, NULL);
+	rc = ntb_peer_mw_get_addr(tc->ntb, idx, &base, &mw->win_size);
 	if (rc)
 		return rc;

@ -913,12 +929,27 @@ static int tool_probe(struct ntb_client *self, struct ntb_dev *ntb)
 	int rc;
 	int i;

+	if (!ntb->ops->mw_set_trans) {
+		dev_dbg(&ntb->dev, "need inbound MW based NTB API\n");
+		rc = -EINVAL;
+		goto err_tc;
+	}
+
+	if (ntb_spad_count(ntb) < 1) {
+		dev_dbg(&ntb->dev, "no enough scratchpads\n");
+		rc = -EINVAL;
+		goto err_tc;
+	}
+
 	if (ntb_db_is_unsafe(ntb))
 		dev_dbg(&ntb->dev, "doorbell is unsafe\n");

 	if (ntb_spad_is_unsafe(ntb))
 		dev_dbg(&ntb->dev, "scratchpad is unsafe\n");

+	if (ntb_peer_port_count(ntb) != NTB_DEF_PEER_CNT)
+		dev_warn(&ntb->dev, "multi-port NTB is unsupported\n");
+
 	tc = kzalloc(sizeof(*tc), GFP_KERNEL);
 	if (!tc) {
 		rc = -ENOMEM;
@ -928,7 +959,7 @@ static int tool_probe(struct ntb_client *self, struct ntb_dev *ntb)
 	tc->ntb = ntb;
 	init_waitqueue_head(&tc->link_wq);

-	tc->mw_count = min(ntb_mw_count(tc->ntb), MAX_MWS);
+	tc->mw_count = min(ntb_mw_count(tc->ntb, PIDX), MAX_MWS);
 	for (i = 0; i < tc->mw_count; i++) {
 		rc = tool_init_mw(tc, i);
 		if (rc)
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@ -5,6 +5,7 @@
 *   GPL LICENSE SUMMARY
 *
 *   Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   This program is free software; you can redistribute it and/or modify
 *   it under the terms of version 2 of the GNU General Public License as
@ -18,6 +19,7 @@
 *   BSD LICENSE
 *
 *   Copyright (C) 2015 EMC Corporation. All Rights Reserved.
+ *   Copyright (C) 2016 T-Platforms. All Rights Reserved.
 *
 *   Redistribution and use in source and binary forms, with or without
 *   modification, are permitted provided that the following conditions
@ -106,6 +108,7 @@ static inline char *ntb_topo_string(enum ntb_topo topo)
 * @NTB_SPEED_GEN1:	Link is trained to gen1 speed.
 * @NTB_SPEED_GEN2:	Link is trained to gen2 speed.
 * @NTB_SPEED_GEN3:	Link is trained to gen3 speed.
+ * @NTB_SPEED_GEN4:	Link is trained to gen4 speed.
 */
 enum ntb_speed {
 	NTB_SPEED_AUTO = -1,
@ -113,6 +116,7 @@ enum ntb_speed {
 	NTB_SPEED_GEN1 = 1,
 	NTB_SPEED_GEN2 = 2,
 	NTB_SPEED_GEN3 = 3,
+	NTB_SPEED_GEN4 = 4
 };

 /**
@ -139,6 +143,20 @@ enum ntb_width {
 	NTB_WIDTH_32 = 32,
 };

+/**
+ * enum ntb_default_port - NTB default port number
+ * @NTB_PORT_PRI_USD:	Default port of the NTB_TOPO_PRI/NTB_TOPO_B2B_USD
+ *			topologies
+ * @NTB_PORT_SEC_DSD:	Default port of the NTB_TOPO_SEC/NTB_TOPO_B2B_DSD
+ *			topologies
+ */
+enum ntb_default_port {
+	NTB_PORT_PRI_USD,
+	NTB_PORT_SEC_DSD
+};
+#define NTB_DEF_PEER_CNT	(1)
+#define NTB_DEF_PEER_IDX	(0)
+
 /**
 * struct ntb_client_ops - ntb client operations
 * @probe:		Notify client of a new device.
@ -162,10 +180,12 @@ static inline int ntb_client_ops_is_valid(const struct ntb_client_ops *ops)
 * struct ntb_ctx_ops - ntb driver context operations
 * @link_event:		See ntb_link_event().
 * @db_event:		See ntb_db_event().
+ * @msg_event:		See ntb_msg_event().
 */
 struct ntb_ctx_ops {
 	void (*link_event)(void *ctx);
 	void (*db_event)(void *ctx, int db_vector);
+	void (*msg_event)(void *ctx);
 };

 static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
@ -174,18 +194,27 @@ static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
 	return
 		/* ops->link_event		&& */
 		/* ops->db_event		&& */
+		/* ops->msg_event		&& */
 		1;
 }

 /**
 * struct ntb_ctx_ops - ntb device operations
- * @mw_count:		See ntb_mw_count().
- * @mw_get_range:	See ntb_mw_get_range().
- * @mw_set_trans:	See ntb_mw_set_trans().
- * @mw_clear_trans:	See ntb_mw_clear_trans().
+ * @port_number:	See ntb_port_number().
+ * @peer_port_count:	See ntb_peer_port_count().
+ * @peer_port_number:	See ntb_peer_port_number().
+ * @peer_port_idx:	See ntb_peer_port_idx().
 * @link_is_up:		See ntb_link_is_up().
 * @link_enable:	See ntb_link_enable().
 * @link_disable:	See ntb_link_disable().
+ * @mw_count:		See ntb_mw_count().
+ * @mw_get_align:	See ntb_mw_get_align().
+ * @mw_set_trans:	See ntb_mw_set_trans().
+ * @mw_clear_trans:	See ntb_mw_clear_trans().
+ * @peer_mw_count:	See ntb_peer_mw_count().
+ * @peer_mw_get_addr:	See ntb_peer_mw_get_addr().
+ * @peer_mw_set_trans:	See ntb_peer_mw_set_trans().
+ * @peer_mw_clear_trans:See ntb_peer_mw_clear_trans().
 * @db_is_unsafe:	See ntb_db_is_unsafe().
 * @db_valid_mask:	See ntb_db_valid_mask().
 * @db_vector_count:	See ntb_db_vector_count().
@ -210,22 +239,43 @@ static inline int ntb_ctx_ops_is_valid(const struct ntb_ctx_ops *ops)
 * @peer_spad_addr:	See ntb_peer_spad_addr().
 * @peer_spad_read:	See ntb_peer_spad_read().
 * @peer_spad_write:	See ntb_peer_spad_write().
+ * @msg_count:		See ntb_msg_count().
+ * @msg_inbits:		See ntb_msg_inbits().
+ * @msg_outbits:	See ntb_msg_outbits().
+ * @msg_read_sts:	See ntb_msg_read_sts().
+ * @msg_clear_sts:	See ntb_msg_clear_sts().
+ * @msg_set_mask:	See ntb_msg_set_mask().
+ * @msg_clear_mask:	See ntb_msg_clear_mask().
+ * @msg_read:		See ntb_msg_read().
+ * @msg_write:		See ntb_msg_write().
 */
 struct ntb_dev_ops {
-	int (*mw_count)(struct ntb_dev *ntb);
-	int (*mw_get_range)(struct ntb_dev *ntb, int idx,
-			    phys_addr_t *base, resource_size_t *size,
-			resource_size_t *align, resource_size_t *align_size);
-	int (*mw_set_trans)(struct ntb_dev *ntb, int idx,
-			    dma_addr_t addr, resource_size_t size);
-	int (*mw_clear_trans)(struct ntb_dev *ntb, int idx);
+	int (*port_number)(struct ntb_dev *ntb);
+	int (*peer_port_count)(struct ntb_dev *ntb);
+	int (*peer_port_number)(struct ntb_dev *ntb, int pidx);
+	int (*peer_port_idx)(struct ntb_dev *ntb, int port);

-	int (*link_is_up)(struct ntb_dev *ntb,
+	u64 (*link_is_up)(struct ntb_dev *ntb,
 			  enum ntb_speed *speed, enum ntb_width *width);
 	int (*link_enable)(struct ntb_dev *ntb,
 			   enum ntb_speed max_speed, enum ntb_width max_width);
 	int (*link_disable)(struct ntb_dev *ntb);

+	int (*mw_count)(struct ntb_dev *ntb, int pidx);
+	int (*mw_get_align)(struct ntb_dev *ntb, int pidx, int widx,
+			    resource_size_t *addr_align,
+			    resource_size_t *size_align,
+			    resource_size_t *size_max);
+	int (*mw_set_trans)(struct ntb_dev *ntb, int pidx, int widx,
+			    dma_addr_t addr, resource_size_t size);
+	int (*mw_clear_trans)(struct ntb_dev *ntb, int pidx, int widx);
+	int (*peer_mw_count)(struct ntb_dev *ntb);
+	int (*peer_mw_get_addr)(struct ntb_dev *ntb, int widx,
+				phys_addr_t *base, resource_size_t *size);
+	int (*peer_mw_set_trans)(struct ntb_dev *ntb, int pidx, int widx,
+				 u64 addr, resource_size_t size);
+	int (*peer_mw_clear_trans)(struct ntb_dev *ntb, int pidx, int widx);
+
 	int (*db_is_unsafe)(struct ntb_dev *ntb);
 	u64 (*db_valid_mask)(struct ntb_dev *ntb);
 	int (*db_vector_count)(struct ntb_dev *ntb);
@ -252,32 +302,55 @@ struct ntb_dev_ops {
 	int (*spad_is_unsafe)(struct ntb_dev *ntb);
 	int (*spad_count)(struct ntb_dev *ntb);

-	u32 (*spad_read)(struct ntb_dev *ntb, int idx);
-	int (*spad_write)(struct ntb_dev *ntb, int idx, u32 val);
+	u32 (*spad_read)(struct ntb_dev *ntb, int sidx);
+	int (*spad_write)(struct ntb_dev *ntb, int sidx, u32 val);

-	int (*peer_spad_addr)(struct ntb_dev *ntb, int idx,
+	int (*peer_spad_addr)(struct ntb_dev *ntb, int pidx, int sidx,
 			      phys_addr_t *spad_addr);
-	u32 (*peer_spad_read)(struct ntb_dev *ntb, int idx);
-	int (*peer_spad_write)(struct ntb_dev *ntb, int idx, u32 val);
+	u32 (*peer_spad_read)(struct ntb_dev *ntb, int pidx, int sidx);
+	int (*peer_spad_write)(struct ntb_dev *ntb, int pidx, int sidx,
+			       u32 val);
+
+	int (*msg_count)(struct ntb_dev *ntb);
+	u64 (*msg_inbits)(struct ntb_dev *ntb);
+	u64 (*msg_outbits)(struct ntb_dev *ntb);
+	u64 (*msg_read_sts)(struct ntb_dev *ntb);
+	int (*msg_clear_sts)(struct ntb_dev *ntb, u64 sts_bits);
+	int (*msg_set_mask)(struct ntb_dev *ntb, u64 mask_bits);
+	int (*msg_clear_mask)(struct ntb_dev *ntb, u64 mask_bits);
+	int (*msg_read)(struct ntb_dev *ntb, int midx, int *pidx, u32 *msg);
+	int (*msg_write)(struct ntb_dev *ntb, int midx, int pidx, u32 msg);
 };

 static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
 {
 	/* commented callbacks are not required: */
 	return
-		ops->mw_count				&&
-		ops->mw_get_range			&&
-		ops->mw_set_trans			&&
-		/* ops->mw_clear_trans			&& */
+		/* Port operations are required for multiport devices */
+		!ops->peer_port_count == !ops->port_number	&&
+		!ops->peer_port_number == !ops->port_number	&&
+		!ops->peer_port_idx == !ops->port_number	&&
+
+		/* Link operations are required */
 		ops->link_is_up				&&
 		ops->link_enable			&&
 		ops->link_disable			&&
+
+		/* One or both MW interfaces should be developed */
+		ops->mw_count				&&
+		ops->mw_get_align			&&
+		(ops->mw_set_trans			||
+		 ops->peer_mw_set_trans)		&&
+		/* ops->mw_clear_trans			&& */
+		ops->peer_mw_count			&&
+		ops->peer_mw_get_addr			&&
+		/* ops->peer_mw_clear_trans		&& */
+
+		/* Doorbell operations are mostly required */
 		/* ops->db_is_unsafe			&& */
 		ops->db_valid_mask			&&
-
 		/* both set, or both unset */
-		(!ops->db_vector_count == !ops->db_vector_mask) &&
-
+		(!ops->db_vector_count == !ops->db_vector_mask)	&&
 		ops->db_read				&&
 		/* ops->db_set				&& */
 		ops->db_clear				&&
@ -291,13 +364,24 @@ static inline int ntb_dev_ops_is_valid(const struct ntb_dev_ops *ops)
 		/* ops->peer_db_read_mask		&& */
 		/* ops->peer_db_set_mask		&& */
 		/* ops->peer_db_clear_mask		&& */
-		/* ops->spad_is_unsafe			&& */
-		ops->spad_count				&&
-		ops->spad_read				&&
-		ops->spad_write				&&
-		/* ops->peer_spad_addr			&& */
-		/* ops->peer_spad_read			&& */
-		ops->peer_spad_write			&&
+
+		/* Scrachpads interface is optional */
+		/* !ops->spad_is_unsafe == !ops->spad_count	&& */
+		!ops->spad_read == !ops->spad_count		&&
+		!ops->spad_write == !ops->spad_count		&&
+		/* !ops->peer_spad_addr == !ops->spad_count	&& */
+		/* !ops->peer_spad_read == !ops->spad_count	&& */
+		!ops->peer_spad_write == !ops->spad_count	&&
+
+		/* Messaging interface is optional */
+		!ops->msg_inbits == !ops->msg_count		&&
+		!ops->msg_outbits == !ops->msg_count		&&
+		!ops->msg_read_sts == !ops->msg_count		&&
+		!ops->msg_clear_sts == !ops->msg_count		&&
+		/* !ops->msg_set_mask == !ops->msg_count	&& */
+		/* !ops->msg_clear_mask == !ops->msg_count	&& */
+		!ops->msg_read == !ops->msg_count		&&
+		!ops->msg_write == !ops->msg_count		&&
 		1;
 }

@ -310,13 +394,12 @@ struct ntb_client {
 	struct device_driver		drv;
 	const struct ntb_client_ops	ops;
 };
-
 #define drv_ntb_client(__drv) container_of((__drv), struct ntb_client, drv)

 /**
 * struct ntb_device - ntb device
 * @dev:		Linux device object.
- * @pdev:		Pci device entry of the ntb.
+ * @pdev:		PCI device entry of the ntb.
 * @topo:		Detected topology of the ntb.
 * @ops:		See &ntb_dev_ops.
 * @ctx:		See &ntb_ctx_ops.
@ -337,7 +420,6 @@ struct ntb_dev {
 	/* block unregister until device is fully released */
 	struct completion		released;
 };
-
 #define dev_ntb(__dev) container_of((__dev), struct ntb_dev, dev)

 /**
@ -434,86 +516,152 @@ void ntb_link_event(struct ntb_dev *ntb);
 * multiple interrupt vectors for doorbells, the vector number indicates which
 * vector received the interrupt.  The vector number is relative to the first
 * vector used for doorbells, starting at zero, and must be less than
- ** ntb_db_vector_count().  The driver may call ntb_db_read() to check which
+ * ntb_db_vector_count().  The driver may call ntb_db_read() to check which
 * doorbell bits need service, and ntb_db_vector_mask() to determine which of
 * those bits are associated with the vector number.
 */
 void ntb_db_event(struct ntb_dev *ntb, int vector);

 /**
- * ntb_mw_count() - get the number of memory windows
+ * ntb_msg_event() - notify driver context of a message event
 * @ntb:	NTB device context.
 *
- * Hardware and topology may support a different number of memory windows.
- *
- * Return: the number of memory windows.
+ * Notify the driver context of a message event.  If hardware supports
+ * message registers, this event indicates, that a new message arrived in
+ * some incoming message register or last sent message couldn't be delivered.
+ * The events can be masked/unmasked by the methods ntb_msg_set_mask() and
+ * ntb_msg_clear_mask().
 */
-static inline int ntb_mw_count(struct ntb_dev *ntb)
+void ntb_msg_event(struct ntb_dev *ntb);
+
+/**
+ * ntb_default_port_number() - get the default local port number
+ * @ntb:	NTB device context.
+ *
+ * If hardware driver doesn't specify port_number() callback method, the NTB
+ * is considered with just two ports. So this method returns default local
+ * port number in compliance with topology.
+ *
+ * NOTE Don't call this method directly. The ntb_port_number() function should
+ * be used instead.
+ *
+ * Return: the default local port number
+ */
+int ntb_default_port_number(struct ntb_dev *ntb);
+
+/**
+ * ntb_default_port_count() - get the default number of peer device ports
+ * @ntb:	NTB device context.
+ *
+ * By default hardware driver supports just one peer device.
+ *
+ * NOTE Don't call this method directly. The ntb_peer_port_count() function
+ * should be used instead.
+ *
+ * Return: the default number of peer ports
+ */
+int ntb_default_peer_port_count(struct ntb_dev *ntb);
+
+/**
+ * ntb_default_peer_port_number() - get the default peer port by given index
+ * @ntb:	NTB device context.
+ * @idx:	Peer port index (should not differ from zero).
+ *
+ * By default hardware driver supports just one peer device, so this method
+ * shall return the corresponding value from enum ntb_default_port.
+ *
+ * NOTE Don't call this method directly. The ntb_peer_port_number() function
+ * should be used instead.
+ *
+ * Return: the peer device port or negative value indicating an error
+ */
+int ntb_default_peer_port_number(struct ntb_dev *ntb, int pidx);
+
+/**
+ * ntb_default_peer_port_idx() - get the default peer device port index by
+ *				 given port number
+ * @ntb:	NTB device context.
+ * @port:	Peer port number (should be one of enum ntb_default_port).
+ *
+ * By default hardware driver supports just one peer device, so while
+ * specified port-argument indicates peer port from enum ntb_default_port,
+ * the return value shall be zero.
+ *
+ * NOTE Don't call this method directly. The ntb_peer_port_idx() function
+ * should be used instead.
+ *
+ * Return: the peer port index or negative value indicating an error
+ */
+int ntb_default_peer_port_idx(struct ntb_dev *ntb, int port);
+
+/**
+ * ntb_port_number() - get the local port number
+ * @ntb:	NTB device context.
+ *
+ * Hardware must support at least simple two-ports ntb connection
+ *
+ * Return: the local port number
+ */
+static inline int ntb_port_number(struct ntb_dev *ntb)
 {
-	return ntb->ops->mw_count(ntb);
+	if (!ntb->ops->port_number)
+		return ntb_default_port_number(ntb);
+
+	return ntb->ops->port_number(ntb);
 }

 /**
- * ntb_mw_get_range() - get the range of a memory window
+ * ntb_peer_port_count() - get the number of peer device ports
 * @ntb:	NTB device context.
- * @idx:	Memory window number.
- * @base:	OUT - the base address for mapping the memory window
- * @size:	OUT - the size for mapping the memory window
- * @align:	OUT - the base alignment for translating the memory window
- * @align_size:	OUT - the size alignment for translating the memory window
 *
- * Get the range of a memory window.  NULL may be given for any output
- * parameter if the value is not needed.  The base and size may be used for
- * mapping the memory window, to access the peer memory.  The alignment and
- * size may be used for translating the memory window, for the peer to access
- * memory on the local system.
+ * Hardware may support an access to memory of several remote domains
+ * over multi-port NTB devices. This method returns the number of peers,
+ * local device can have shared memory with.
 *
- * Return: Zero on success, otherwise an error number.
+ * Return: the number of peer ports
 */
-static inline int ntb_mw_get_range(struct ntb_dev *ntb, int idx,
-				   phys_addr_t *base, resource_size_t *size,
-		resource_size_t *align, resource_size_t *align_size)
+static inline int ntb_peer_port_count(struct ntb_dev *ntb)
 {
-	return ntb->ops->mw_get_range(ntb, idx, base, size,
-			align, align_size);
+	if (!ntb->ops->peer_port_count)
+		return ntb_default_peer_port_count(ntb);
+
+	return ntb->ops->peer_port_count(ntb);
 }

 /**
- * ntb_mw_set_trans() - set the translation of a memory window
+ * ntb_peer_port_number() - get the peer port by given index
 * @ntb:	NTB device context.
- * @idx:	Memory window number.
- * @addr:	The dma address local memory to expose to the peer.
- * @size:	The size of the local memory to expose to the peer.
+ * @pidx:	Peer port index.
 *
- * Set the translation of a memory window.  The peer may access local memory
- * through the window starting at the address, up to the size.  The address
- * must be aligned to the alignment specified by ntb_mw_get_range().  The size
- * must be aligned to the size alignment specified by ntb_mw_get_range().
+ * Peer ports are continuously enumerated by NTB API logic, so this method
+ * lets to retrieve port real number by its index.
 *
- * Return: Zero on success, otherwise an error number.
+ * Return: the peer device port or negative value indicating an error
 */
-static inline int ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
-				   dma_addr_t addr, resource_size_t size)
+static inline int ntb_peer_port_number(struct ntb_dev *ntb, int pidx)
 {
-	return ntb->ops->mw_set_trans(ntb, idx, addr, size);
+	if (!ntb->ops->peer_port_number)
+		return ntb_default_peer_port_number(ntb, pidx);
+
+	return ntb->ops->peer_port_number(ntb, pidx);
 }

 /**
- * ntb_mw_clear_trans() - clear the translation of a memory window
+ * ntb_peer_port_idx() - get the peer device port index by given port number
 * @ntb:	NTB device context.
- * @idx:	Memory window number.
+ * @port:	Peer port number.
 *
- * Clear the translation of a memory window.  The peer may no longer access
- * local memory through the window.
+ * Inverse operation of ntb_peer_port_number(), so one can get port index
+ * by specified port number.
 *
- * Return: Zero on success, otherwise an error number.
+ * Return: the peer port index or negative value indicating an error
 */
-static inline int ntb_mw_clear_trans(struct ntb_dev *ntb, int idx)
+static inline int ntb_peer_port_idx(struct ntb_dev *ntb, int port)
 {
-	if (!ntb->ops->mw_clear_trans)
-		return ntb->ops->mw_set_trans(ntb, idx, 0, 0);
+	if (!ntb->ops->peer_port_idx)
+		return ntb_default_peer_port_idx(ntb, port);

-	return ntb->ops->mw_clear_trans(ntb, idx);
+	return ntb->ops->peer_port_idx(ntb, port);
 }

 /**
@ -526,25 +674,26 @@ static inline int ntb_mw_clear_trans(struct ntb_dev *ntb, int idx)
 * state once after every link event.  It is safe to query the link state in
 * the context of the link event callback.
 *
- * Return: One if the link is up, zero if the link is down, otherwise a
- *		negative value indicating the error number.
+ * Return: bitfield of indexed ports link state: bit is set/cleared if the
+ *         link is up/down respectively.
 */
-static inline int ntb_link_is_up(struct ntb_dev *ntb,
+static inline u64 ntb_link_is_up(struct ntb_dev *ntb,
 				 enum ntb_speed *speed, enum ntb_width *width)
 {
 	return ntb->ops->link_is_up(ntb, speed, width);
 }

 /**
- * ntb_link_enable() - enable the link on the secondary side of the ntb
+ * ntb_link_enable() - enable the local port ntb connection
 * @ntb:	NTB device context.
 * @max_speed:	The maximum link speed expressed as PCIe generation number.
 * @max_width:	The maximum link width expressed as the number of PCIe lanes.
 *
- * Enable the link on the secondary side of the ntb.  This can only be done
- * from the primary side of the ntb in primary or b2b topology.  The ntb device
- * should train the link to its maximum speed and width, or the requested speed
- * and width, whichever is smaller, if supported.
+ * Enable the NTB/PCIe link on the local or remote (for bridge-to-bridge
+ * topology) side of the bridge. If it's supported the ntb device should train
+ * the link to its maximum speed and width, or the requested speed and width,
+ * whichever is smaller. Some hardware doesn't support PCIe link training, so
+ * the last two arguments will be ignored then.
 *
 * Return: Zero on success, otherwise an error number.
 */
@ -556,14 +705,14 @@ static inline int ntb_link_enable(struct ntb_dev *ntb,
 }

 /**
- * ntb_link_disable() - disable the link on the secondary side of the ntb
+ * ntb_link_disable() - disable the local port ntb connection
 * @ntb:	NTB device context.
 *
- * Disable the link on the secondary side of the ntb.  This can only be
- * done from the primary side of the ntb in primary or b2b topology.  The ntb
- * device should disable the link.  Returning from this call must indicate that
- * a barrier has passed, though with no more writes may pass in either
- * direction across the link, except if this call returns an error number.
+ * Disable the link on the local or remote (for b2b topology) of the ntb.
+ * The ntb device should disable the link.  Returning from this call must
+ * indicate that a barrier has passed, though with no more writes may pass in
+ * either direction across the link, except if this call returns an error
+ * number.
 *
 * Return: Zero on success, otherwise an error number.
 */
@ -572,6 +721,183 @@ static inline int ntb_link_disable(struct ntb_dev *ntb)
 	return ntb->ops->link_disable(ntb);
 }

+/**
+ * ntb_mw_count() - get the number of inbound memory windows, which could
+ *                  be created for a specified peer device
+ * @ntb:	NTB device context.
+ * @pidx:	Port index of peer device.
+ *
+ * Hardware and topology may support a different number of memory windows.
+ * Moreover different peer devices can support different number of memory
+ * windows. Simply speaking this method returns the number of possible inbound
+ * memory windows to share with specified peer device.
+ *
+ * Return: the number of memory windows.
+ */
+static inline int ntb_mw_count(struct ntb_dev *ntb, int pidx)
+{
+	return ntb->ops->mw_count(ntb, pidx);
+}
+
+/**
+ * ntb_mw_get_align() - get the restriction parameters of inbound memory window
+ * @ntb:	NTB device context.
+ * @pidx:	Port index of peer device.
+ * @widx:	Memory window index.
+ * @addr_align:	OUT - the base alignment for translating the memory window
+ * @size_align:	OUT - the size alignment for translating the memory window
+ * @size_max:	OUT - the maximum size of the memory window
+ *
+ * Get the alignments of an inbound memory window with specified index.
+ * NULL may be given for any output parameter if the value is not needed.
+ * The alignment and size parameters may be used for allocation of proper
+ * shared memory.
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+static inline int ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int widx,
+				   resource_size_t *addr_align,
+				   resource_size_t *size_align,
+				   resource_size_t *size_max)
+{
+	return ntb->ops->mw_get_align(ntb, pidx, widx, addr_align, size_align,
+				      size_max);
+}
+
+/**
+ * ntb_mw_set_trans() - set the translation of an inbound memory window
+ * @ntb:	NTB device context.
+ * @pidx:	Port index of peer device.
+ * @widx:	Memory window index.
+ * @addr:	The dma address of local memory to expose to the peer.
+ * @size:	The size of the local memory to expose to the peer.
+ *
+ * Set the translation of a memory window.  The peer may access local memory
+ * through the window starting at the address, up to the size.  The address
+ * and size must be aligned in compliance with restrictions of
+ * ntb_mw_get_align(). The region size should not exceed the size_max parameter
+ * of that method.
+ *
+ * This method may not be implemented due to the hardware specific memory
+ * windows interface.
+ *
+ * Return: Zero on success, otherwise an error number.
+ */
+static inline int ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
+				   dma_addr_t addr, resource_size_t size)
+{
+	if (!ntb->ops->mw_set_trans)
+		return 0;
+
+	return ntb->ops->mw_set_trans(ntb, pidx, widx, addr, size);
+}
+
+/**
+ * ntb_mw_clear_trans() - clear the translation address of an inbound memory
+ *                        window
+ * @ntb:	NTB device context.
+ * @pidx:	Port index of peer device.
+ * @widx:	Memory window index.
+ *
+ * Clear the translation of an inbound memory window.  The peer may no longer
+ * access local memory through the window.
+ *
+ * Return: Zero on success, otherwise an error number.
+ */
+static inline int ntb_mw_clear_trans(struct ntb_dev *ntb, int pidx, int widx)
+{
+	if (!ntb->ops->mw_clear_trans)
+		return ntb_mw_set_trans(ntb, pidx, widx, 0, 0);
+
+	return ntb->ops->mw_clear_trans(ntb, pidx, widx);
+}
+
+/**
+ * ntb_peer_mw_count() - get the number of outbound memory windows, which could
+ *                       be mapped to access a shared memory
+ * @ntb:	NTB device context.
+ *
+ * Hardware and topology may support a different number of memory windows.
+ * This method returns the number of outbound memory windows supported by
+ * local device.
+ *
+ * Return: the number of memory windows.
+ */
+static inline int ntb_peer_mw_count(struct ntb_dev *ntb)
+{
+	return ntb->ops->peer_mw_count(ntb);
+}
+
+/**
+ * ntb_peer_mw_get_addr() - get map address of an outbound memory window
+ * @ntb:	NTB device context.
+ * @widx:	Memory window index (within ntb_peer_mw_count() return value).
+ * @base:	OUT - the base address of mapping region.
+ * @size:	OUT - the size of mapping region.
+ *
+ * Get base and size of memory region to map.  NULL may be given for any output
+ * parameter if the value is not needed.  The base and size may be used for
+ * mapping the memory window, to access the peer memory.
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+static inline int ntb_peer_mw_get_addr(struct ntb_dev *ntb, int widx,
+				      phys_addr_t *base, resource_size_t *size)
+{
+	return ntb->ops->peer_mw_get_addr(ntb, widx, base, size);
+}
+
+/**
+ * ntb_peer_mw_set_trans() - set a translation address of a memory window
+ *                           retrieved from a peer device
+ * @ntb:	NTB device context.
+ * @pidx:	Port index of peer device the translation address received from.
+ * @widx:	Memory window index.
+ * @addr:	The dma address of the shared memory to access.
+ * @size:	The size of the shared memory to access.
+ *
+ * Set the translation of an outbound memory window.  The local device may
+ * access shared memory allocated by a peer device sent the address.
+ *
+ * This method may not be implemented due to the hardware specific memory
+ * windows interface, so a translation address can be only set on the side,
+ * where shared memory (inbound memory windows) is allocated.
+ *
+ * Return: Zero on success, otherwise an error number.
+ */
+static inline int ntb_peer_mw_set_trans(struct ntb_dev *ntb, int pidx, int widx,
+					u64 addr, resource_size_t size)
+{
+	if (!ntb->ops->peer_mw_set_trans)
+		return 0;
+
+	return ntb->ops->peer_mw_set_trans(ntb, pidx, widx, addr, size);
+}
+
+/**
+ * ntb_peer_mw_clear_trans() - clear the translation address of an outbound
+ *                             memory window
+ * @ntb:	NTB device context.
+ * @pidx:	Port index of peer device.
+ * @widx:	Memory window index.
+ *
+ * Clear the translation of a outbound memory window.  The local device may no
+ * longer access a shared memory through the window.
+ *
+ * This method may not be implemented due to the hardware specific memory
+ * windows interface.
+ *
+ * Return: Zero on success, otherwise an error number.
+ */
+static inline int ntb_peer_mw_clear_trans(struct ntb_dev *ntb, int pidx,
+					  int widx)
+{
+	if (!ntb->ops->peer_mw_clear_trans)
+		return ntb_peer_mw_set_trans(ntb, pidx, widx, 0, 0);
+
+	return ntb->ops->peer_mw_clear_trans(ntb, pidx, widx);
+}
+
 /**
 * ntb_db_is_unsafe() - check if it is safe to use hardware doorbell
 * @ntb:	NTB device context.
@ -900,47 +1226,58 @@ static inline int ntb_spad_is_unsafe(struct ntb_dev *ntb)
 * @ntb:	NTB device context.
 *
 * Hardware and topology may support a different number of scratchpads.
+ * Although it must be the same for all ports per NTB device.
 *
 * Return: the number of scratchpads.
 */
 static inline int ntb_spad_count(struct ntb_dev *ntb)
 {
+	if (!ntb->ops->spad_count)
+		return 0;
+
 	return ntb->ops->spad_count(ntb);
 }

 /**
 * ntb_spad_read() - read the local scratchpad register
 * @ntb:	NTB device context.
- * @idx:	Scratchpad index.
+ * @sidx:	Scratchpad index.
 *
 * Read the local scratchpad register, and return the value.
 *
 * Return: The value of the local scratchpad register.
 */
-static inline u32 ntb_spad_read(struct ntb_dev *ntb, int idx)
+static inline u32 ntb_spad_read(struct ntb_dev *ntb, int sidx)
 {
-	return ntb->ops->spad_read(ntb, idx);
+	if (!ntb->ops->spad_read)
+		return ~(u32)0;
+
+	return ntb->ops->spad_read(ntb, sidx);
 }

 /**
 * ntb_spad_write() - write the local scratchpad register
 * @ntb:	NTB device context.
- * @idx:	Scratchpad index.
+ * @sidx:	Scratchpad index.
 * @val:	Scratchpad value.
 *
 * Write the value to the local scratchpad register.
 *
 * Return: Zero on success, otherwise an error number.
 */
-static inline int ntb_spad_write(struct ntb_dev *ntb, int idx, u32 val)
+static inline int ntb_spad_write(struct ntb_dev *ntb, int sidx, u32 val)
 {
-	return ntb->ops->spad_write(ntb, idx, val);
+	if (!ntb->ops->spad_write)
+		return -EINVAL;
+
+	return ntb->ops->spad_write(ntb, sidx, val);
 }

 /**
 * ntb_peer_spad_addr() - address of the peer scratchpad register
 * @ntb:	NTB device context.
- * @idx:	Scratchpad index.
+ * @pidx:	Port index of peer device.
+ * @sidx:	Scratchpad index.
 * @spad_addr:	OUT - The address of the peer scratchpad register.
 *
 * Return the address of the peer doorbell register.  This may be used, for
@ -948,45 +1285,213 @@ static inline int ntb_spad_write(struct ntb_dev *ntb, int idx, u32 val)
 *
 * Return: Zero on success, otherwise an error number.
 */
-static inline int ntb_peer_spad_addr(struct ntb_dev *ntb, int idx,
+static inline int ntb_peer_spad_addr(struct ntb_dev *ntb, int pidx, int sidx,
 				     phys_addr_t *spad_addr)
 {
 	if (!ntb->ops->peer_spad_addr)
 		return -EINVAL;

-	return ntb->ops->peer_spad_addr(ntb, idx, spad_addr);
+	return ntb->ops->peer_spad_addr(ntb, pidx, sidx, spad_addr);
 }

 /**
 * ntb_peer_spad_read() - read the peer scratchpad register
 * @ntb:	NTB device context.
- * @idx:	Scratchpad index.
+ * @pidx:	Port index of peer device.
+ * @sidx:	Scratchpad index.
 *
 * Read the peer scratchpad register, and return the value.
 *
 * Return: The value of the local scratchpad register.
 */
-static inline u32 ntb_peer_spad_read(struct ntb_dev *ntb, int idx)
+static inline u32 ntb_peer_spad_read(struct ntb_dev *ntb, int pidx, int sidx)
 {
 	if (!ntb->ops->peer_spad_read)
-		return 0;
+		return ~(u32)0;

-	return ntb->ops->peer_spad_read(ntb, idx);
+	return ntb->ops->peer_spad_read(ntb, pidx, sidx);
 }

 /**
 * ntb_peer_spad_write() - write the peer scratchpad register
 * @ntb:	NTB device context.
- * @idx:	Scratchpad index.
+ * @pidx:	Port index of peer device.
+ * @sidx:	Scratchpad index.
 * @val:	Scratchpad value.
 *
 * Write the value to the peer scratchpad register.
 *
 * Return: Zero on success, otherwise an error number.
 */
-static inline int ntb_peer_spad_write(struct ntb_dev *ntb, int idx, u32 val)
+static inline int ntb_peer_spad_write(struct ntb_dev *ntb, int pidx, int sidx,
+				      u32 val)
 {
-	return ntb->ops->peer_spad_write(ntb, idx, val);
+	if (!ntb->ops->peer_spad_write)
+		return -EINVAL;
+
+	return ntb->ops->peer_spad_write(ntb, pidx, sidx, val);
+}
+
+/**
+ * ntb_msg_count() - get the number of message registers
+ * @ntb:	NTB device context.
+ *
+ * Hardware may support a different number of message registers.
+ *
+ * Return: the number of message registers.
+ */
+static inline int ntb_msg_count(struct ntb_dev *ntb)
+{
+	if (!ntb->ops->msg_count)
+		return 0;
+
+	return ntb->ops->msg_count(ntb);
+}
+
+/**
+ * ntb_msg_inbits() - get a bitfield of inbound message registers status
+ * @ntb:	NTB device context.
+ *
+ * The method returns the bitfield of status and mask registers, which related
+ * to inbound message registers.
+ *
+ * Return: bitfield of inbound message registers.
+ */
+static inline u64 ntb_msg_inbits(struct ntb_dev *ntb)
+{
+	if (!ntb->ops->msg_inbits)
+		return 0;
+
+	return ntb->ops->msg_inbits(ntb);
+}
+
+/**
+ * ntb_msg_outbits() - get a bitfield of outbound message registers status
+ * @ntb:	NTB device context.
+ *
+ * The method returns the bitfield of status and mask registers, which related
+ * to outbound message registers.
+ *
+ * Return: bitfield of outbound message registers.
+ */
+static inline u64 ntb_msg_outbits(struct ntb_dev *ntb)
+{
+	if (!ntb->ops->msg_outbits)
+		return 0;
+
+	return ntb->ops->msg_outbits(ntb);
+}
+
+/**
+ * ntb_msg_read_sts() - read the message registers status
+ * @ntb:	NTB device context.
+ *
+ * Read the status of message register. Inbound and outbound message registers
+ * related bits can be filtered by masks retrieved from ntb_msg_inbits() and
+ * ntb_msg_outbits().
+ *
+ * Return: status bits of message registers
+ */
+static inline u64 ntb_msg_read_sts(struct ntb_dev *ntb)
+{
+	if (!ntb->ops->msg_read_sts)
+		return 0;
+
+	return ntb->ops->msg_read_sts(ntb);
+}
+
+/**
+ * ntb_msg_clear_sts() - clear status bits of message registers
+ * @ntb:	NTB device context.
+ * @sts_bits:	Status bits to clear.
+ *
+ * Clear bits in the status register.
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+static inline int ntb_msg_clear_sts(struct ntb_dev *ntb, u64 sts_bits)
+{
+	if (!ntb->ops->msg_clear_sts)
+		return -EINVAL;
+
+	return ntb->ops->msg_clear_sts(ntb, sts_bits);
+}
+
+/**
+ * ntb_msg_set_mask() - set mask of message register status bits
+ * @ntb:	NTB device context.
+ * @mask_bits:	Mask bits.
+ *
+ * Mask the message registers status bits from raising the message event.
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+static inline int ntb_msg_set_mask(struct ntb_dev *ntb, u64 mask_bits)
+{
+	if (!ntb->ops->msg_set_mask)
+		return -EINVAL;
+
+	return ntb->ops->msg_set_mask(ntb, mask_bits);
+}
+
+/**
+ * ntb_msg_clear_mask() - clear message registers mask
+ * @ntb:	NTB device context.
+ * @mask_bits:	Mask bits to clear.
+ *
+ * Clear bits in the message events mask register.
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+static inline int ntb_msg_clear_mask(struct ntb_dev *ntb, u64 mask_bits)
+{
+	if (!ntb->ops->msg_clear_mask)
+		return -EINVAL;
+
+	return ntb->ops->msg_clear_mask(ntb, mask_bits);
+}
+
+/**
+ * ntb_msg_read() - read message register with specified index
+ * @ntb:	NTB device context.
+ * @midx:	Message register index
+ * @pidx:	OUT - Port index of peer device a message retrieved from
+ * @msg:	OUT - Data
+ *
+ * Read data from the specified message register. Source port index of a
+ * message is retrieved as well.
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+static inline int ntb_msg_read(struct ntb_dev *ntb, int midx, int *pidx,
+			       u32 *msg)
+{
+	if (!ntb->ops->msg_read)
+		return -EINVAL;
+
+	return ntb->ops->msg_read(ntb, midx, pidx, msg);
+}
+
+/**
+ * ntb_msg_write() - write data to the specified message register
+ * @ntb:	NTB device context.
+ * @midx:	Message register index
+ * @pidx:	Port index of peer device a message being sent to
+ * @msg:	Data to send
+ *
+ * Send data to a specified peer device using the defined message register.
+ * Message event can be raised if the midx registers isn't empty while
+ * calling this method and the corresponding interrupt isn't masked.
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+static inline int ntb_msg_write(struct ntb_dev *ntb, int midx, int pidx,
+				u32 msg)
+{
+	if (!ntb->ops->msg_write)
+		return -EINVAL;
+
+	return ntb->ops->msg_write(ntb, midx, pidx, msg);
 }

 #endif
--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@ -18,6 +18,7 @@ LIST_DEVS=FALSE

 DEBUGFS=${DEBUGFS-/sys/kernel/debug}

+DB_BITMASK=0x7FFF
 PERF_RUN_ORDER=32
 MAX_MW_SIZE=0
 RUN_DMA_TESTS=
@ -38,6 +39,7 @@ function show_help()
 	echo "be highly recommended."
 	echo
 	echo "Options:"
+	echo "  -b BITMASK      doorbell clear bitmask for ntb_tool"
 	echo "  -C              don't cleanup ntb modules on exit"
 	echo "  -d              run dma tests"
 	echo "  -h              show this help message"
@ -52,8 +54,9 @@ function show_help()
 function parse_args()
 {
 	OPTIND=0
-	while getopts "Cdhlm:r:p:w:" opt; do
+	while getopts "b:Cdhlm:r:p:w:" opt; do
 		case "$opt" in
+		b)  DB_BITMASK=${OPTARG} ;;
 		C)  DONT_CLEANUP=1 ;;
 		d)  RUN_DMA_TESTS=1 ;;
 		h)  show_help; exit 0 ;;
@ -85,6 +88,10 @@ set -e
 function _modprobe()
 {
        modprobe "$@"
+
+	if [[ "$REMOTE_HOST" != "" ]]; then
+		ssh "$REMOTE_HOST" modprobe "$@"
+	fi
 }

 function split_remote()
@ -154,7 +161,7 @@ function doorbell_test()

 	echo "Running db tests on: $(basename $LOC) / $(basename $REM)"

-	write_file "c 0xFFFFFFFF" "$REM/db"
+	write_file "c $DB_BITMASK" "$REM/db"

 	for ((i=1; i <= 8; i++)); do
 		let DB=$(read_file "$REM/db") || true