Commit Graph

35 Commits

Author SHA1 Message Date
Flavio Crisciani 856b6d4fc7
NetworkDB testing infra
- Diagnose framework that exposes REST API for db interaction
- Dockerfile to build the test image
- Periodic print of stats regarding queue size
- Client and server side for integration with testkit
- Added write-delete-leave-join
- Added test write-delete-wait-leave-join
- Added write-wait-leave-join

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-27 08:50:43 -07:00
Flavio Crisciani ceb8146a90
NetworkDB allow setting PacketSize
- Introduce the possibility to specify the max buffer length
  in network DB. This will allow to use the whole MTU limit of
  the interface

- Add queue stats per network, it can be handy to identify the
  node's throughput per network and identify unbalance between
  nodes that can point to an MTU missconfiguration

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-26 13:44:33 -07:00
Flavio Crisciani 297c3d4ad2
NetworkDB incorrect number of entries in networkNodes
A rapid (within networkReapTime 30min) leave/join network
can corrupt the list of nodes per network with multiple copies
of the same nodes.
The fix makes sure that each node is present only once

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-07-18 16:57:49 -07:00
Santhosh Manohar d28eb6f605 Fix go generate for protobuf
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-07-05 16:31:12 -07:00
Flavio Crisciani d64e71e4f7
Service discovery logic rework
changed the ipMap to SetMatrix to allow transient states
Compacted the addSvc and deleteSvc into a one single method
Updated the datastructure for backends to allow storing all the information needed
to cleanup properly during the cleanupServiceBindings
Removed the enable/disable Service logic that was racing with sbLeave/sbJoin logic
Add some debug logs to track further race conditions

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2017-06-11 20:49:29 -07:00
Santhosh Manohar 9010390940 Handle single manager reload by having workers reconnect
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-05-31 14:36:23 -07:00
Santhosh Manohar 4b2279dc86 control-plane hardning: cleanup local state on peer leaving a network
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-03-31 01:49:03 -07:00
Santhosh Manohar 4693eab00d swarm mode network inspect should provide cluser-wide task details
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2017-03-10 19:12:00 -08:00
Santhosh Manohar b2a41c17d9 Check for node's presence in networkDB's node map before accessing.
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-12-05 00:58:59 -08:00
Daehyeok Mun d55c701e0c Fixed misspelling
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
2016-11-28 11:46:52 -07:00
Santhosh Manohar 19e42ae0e7 Separate service LB & SD from network plumbing
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-11-17 13:09:14 -08:00
Victor Vieux 846236b6c8 fix unsafe acces on arm
Signed-off-by: Victor Vieux <vieux@docker.com>
2016-11-10 23:05:11 -08:00
allencloud 5794d382d7 remove unused mConfig
Signed-off-by: allencloud <allen.sun@daocloud.io>
2016-11-08 18:18:55 +08:00
Alessandro Boch a98901aebe Merge pull request #1519 from sanimej/newlb
Add sandbox API for task insertion to service LB and service discovery
2016-11-03 13:31:46 -07:00
Santhosh Manohar fb3c38d655 Add NetworkDB API to fetch the per network peer (gossip cluster) list
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-11-02 13:58:15 -07:00
Santhosh Manohar 16df588f51 Add sandbox API for task insertion to service LB and service discovery
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-10-25 05:41:44 -07:00
Santhosh Manohar caafbccb27 Reap failed nodes after 24 hours
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-10-20 11:24:04 -07:00
Alessandro Boch 9fbb4ecbb4 Merge pull request #1476 from sanimej/time
Use monotonic clock source to reap networkDB entries
2016-10-20 07:30:41 -07:00
Santhosh Manohar f4d61688ae Use monotonic clock for reaping networkDB entries
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-10-19 22:30:47 -07:00
Alexander Morozov eacb25db31 networkdb: fix race in deleteNetwork
There are multiple places which reads from that slice(i.e. bulkSync).

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
2016-10-12 08:42:05 -07:00
Jana Radhakrishnan 520a4c52b8 Purge stale nodes with same prefix and IP
Since the node name randomization fix, we need to make sure that we
purge the old node with the same prefix and same IP from the nodes
database if it still present. This causes unnecessary reconnect
attempts.

Also added a change to avoid unnecessary update of local lamport time
and only do it of we are ready to do a push pull on a join. Join should
happen only when the node is bootstrapped or when trying to reconnect
with a failed node.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-09-23 14:48:54 -07:00
Jana Radhakrishnan 8b04ffb31a Honor user provided listen address for gossip
If user provided a non-zero listen address, honor that and bind only to
that address. Right now it is not honored and we always bind to all ip
addresses in the host.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-09-22 11:41:57 -07:00
Jana Radhakrishnan 716810dc9f Recover from transient gossip failures
Currently if there is any transient gossip failure in any node the
recoevry process depends on other nodes propogating the information
indirectly. In cases if these transient failures affects all the nodes
that this node has in its memberlist then this node will be permenantly
cutoff from the the gossip channel. Added node state management code in
networkdb to address these problems by trying to rejoin the cluster via
the failed nodes when there is a failure. This also necessitates the
need to add new messages called node event messages to differentiate
between node leave and node failure.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-09-19 15:58:14 -07:00
Santhosh Manohar 2fb2fd20c7 Add sandbox API for task insertion to service LB and service discovery
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-09-08 17:39:45 -07:00
Santhosh Manohar a6dff4cd6a Merge pull request #1406 from mrjana/bugs
Ensure add newly joined node to networknodes
2016-08-21 22:03:03 -07:00
Jana Radhakrishnan 0a5280df92 Ensure add newly joined node to networknodes
In cases a node left the cluster and quickly rejoined before the node
entry is expired by other nodes in the cluster, when the node rejoins we
fail to add it to the quick lookup database. Fixed it.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-08-19 17:18:15 -07:00
Jana Radhakrishnan dd3b13182a Ignore delete events for non-existent entries
In networkdb we should ignore delete events for entries which doesn't
exist in the db. This is always true because if the entry did not exist
then the entry has been removed way earlier and got purged after the
reap timer and this notification is very stale.

Also there were duplicate delete notifications being sent to the
clients. One when the actual delete event was received from gossip and
later when the entry was getting reaped. The second notification is
unnecessary and may cause issues with the clients if they are not
coded for idempotency.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-08-18 13:57:24 -07:00
Santhosh Manohar 9ee9aa0947 Cleanup networkdb state when the network is deleted locally
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-08-10 12:44:05 -07:00
Alexander Morozov 203dd17115 networkdb: fix data races in map access
Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
2016-08-05 14:24:30 -07:00
Madhu Venugopal f6d896889d Adding Advertise-addr support
With this change, all the auto-detection of the addresses are removed
from libnetwork and the caller takes the responsibilty to have a proper
advertise-addr in various scenarios (including externally facing public
advertise-addr with an internal facing private listen-addr)

Signed-off-by: Madhu Venugopal <madhu@docker.com>
2016-07-21 02:44:25 -07:00
Jana Radhakrishnan ddecda9cfe Properly purge node networks when node goes away
When a node goes away purge all the network attachments from the node
and make sure we don't attempt bulk syncing to that node once removed.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-06-14 12:39:38 -07:00
Santhosh Manohar 9a0ad6492f Add support for encrypting gossip traffic
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
2016-06-04 03:55:14 -07:00
Jana Radhakrishnan a608012e97 Use protobuf in networkdb core messages
Convert all networkdb core message types from go message types to
protobuf message types. This faciliates future modification of the
message structure without breaking backward compatibility.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-05-17 09:18:24 -07:00
Jana Radhakrishnan abd19c0d29 Fix gossip network event overwriting self
When a node joins a network it sends out a gossip event before it
updates it's own in-memory state. This can create a race where the node
gets the event back from a remote node before we update in-memory state
and we treat that as latest state. To avoid this race, always generate
the gossip after updating local state.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-04-25 09:47:36 -07:00
Jana Radhakrishnan dd4950f36d Add network scoped gossip database
Network DB is a network scoped gossip database built
on top of hashicorp/memberlist providing an eventually
consistent state store.

It limits the scope of the gossip and periodic bulk syncing
for table entries to only the nodes which participate in the
network to which the gossip belongs. This designs make the
gossip layer scale better and only consumes resources for the
network state that the node participates in.

Since the complete state for a network is maintained by all nodes
participating in the network, all nodes will eventually converge
to the same state.

NetworkDB also provides facilities for the users of the package to
watch on any table (or all tables) and get notified if there are
state changes of interest that happened anywhere in the cluster when
that state change eventually finds it's way to the watcher's node.

Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
2016-04-08 12:58:09 -07:00