Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13 Backports 2024-05-16 #32573

Merged
merged 8 commits into from
May 23, 2024
Merged

v1.13 Backports 2024-05-16 #32573

merged 8 commits into from
May 23, 2024

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented May 16, 2024

Once this PR is merged, a GitHub action will update the labels of these PRs:

 32336 32552

[ upstream commit d0af3d7 ]

We shouldn't import testing code into production code, as it can lead to
unexpected side effects due to e.g., init functions. Let's address this
by hard-coding the "PolicyEnforcement" constant, rather than importing
it. This is consistent with the same usage as part of the "config" command.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit cfb3b8a ]

[ backporter's notes: applied the changes to pkg/clustermesh/config.go ]

It is intended to be used by CLI tools to retrieve the configuration
files of all remote clusters in a given directory, to be used, e.g.,
for troubleshooting purposes.

While being there, let's also replace the path package with the filepath
one, which is more appropriate in this context, and it would allow to
theoretically handle Windows paths as well.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 added kind/backports This PR provides functionality previously merged into master. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. area/clustermesh Relates to multi-cluster routing functionality in Cilium. labels May 16, 2024
@giorio94 giorio94 force-pushed the pr/v1.13-backport-2024-05-16-11-35 branch from bb8eb91 to fa3bab2 Compare May 16, 2024 10:33
@giorio94
Copy link
Member Author

/test-backport-1.13

@giorio94 giorio94 force-pushed the pr/v1.13-backport-2024-05-16-11-35 branch from fa3bab2 to 5d97515 Compare May 16, 2024 12:13
@giorio94
Copy link
Member Author

/test-backport-1.13

[ upstream commit 2d07cfc ]

[ backporter's notes: replaced cmp.Or usage, as not yet available
  in go 1.20. Additionally replaced tls.VersionName with a local
  implementation, as also not available in go 1.20. ]

Troubleshooting etcd connectivity issues, regardless of whether to the
Cilium kvstore or to a remote cluster, is a complex activity, as issues
can concern network connectivity, TLS certificates mismatch, authn/authz
policies and so on.

As an effort to simplify this process, let's introduce a new utility
responsible for performing a set of sanity checks, and outputting the
result in a user-friendly way. This utility is intended to be then
leveraged by dedicated CLI commands integrated with the various
components. More in detail, this utility performs the following
operations:

* Asserts that the etcd configuration can be correctly parsed;
* For each endpoint:
  - Outputs the DNS resolution;
  - Assert that the endpoint is reachable at the network level (i.e.,
    that a TCP connection can be successfully established);
  - When https is enabled, asserts that a TLS connection can be correctly
    established to the endpoint (i.e., that the provided certificates
    are valid); the check includes both server and client (if enabled)
    authentication; additionally outputs TLS specific information;
  - Outputs the version of the endpoint, as returned by GET /version;
* Outputs information regarding Root CAs and client certificates, if
  configured; additionally checks whether the client certificate is
  valid according to the root CAs;
* Asserts that the etcd client can correctly establish a connection;
* Asserts that the heartbeat key can be retrieved, as a basic
  authorization check.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 9654576 ]

[ backporter's notes: moved the files to cilium/cmd, and performed minor
  adaptations as necessary; additionally dropped the custom dialer usage
  in the troubleshoot clustermesh command, as service name to IP address
  resolution was not necessary in clustermesh in Cilium v1.13, as
  KVStoreMesh was not yet available. ]

Introduce two new cilium-dbg commands, namely "troubleshoot kvstore" and
"troubleshoot clustermesh", responsible for running a set of sanity
checks to help troubleshoot etcd connectivity issues, covering network
connectivity, TLS authentication, authn/authz policies and so on.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 9156e23 ]

[ backporter's notes: changed the cilium command from cilium-dbg to
  cilium. ]

As useful to troubleshoot kvstore and clustermesh issues.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 4172c62 ]

[ backporter's notes: dropped the reference to running the KVStoreMesh
  troubleshot command, as not relevant in Cilium v1.13. Additionally
  replaced cilium-dbg with cilium. ]

Document the usage of the newly introduced troubleshoot command to
investigate connectivity issues towards the clustermesh control plane
(i.e., etcd) in remote clusters.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 189e8ba ]

[ backporter's notes: dropped the cilium-dbg change, as not applicable
  to Cilium v1.14, and performed minor adaptations. ]

Add a clarification note that the manual steps presented in the guide
are mostly alternative to using the automatic tools described in the
previous section. Additionally, drop the example errors from the TLS
certificates step, as potentially misleading. Users shall leverage
the troubleshoot command instead. Finally, let's fix a couple of typos.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
[ upstream commit 913e41b ]

[ backporter's notes: dropped references to KVStoreMesh, as not
  available in Cilium v1.13, and to initial synchronization checks,
  as not exposed in Cilium v1.13. ]

They apply only when Cilium is configured in kvstore mode, which is
seldom the case these days. The lack of local information is also not
clustermesh specific, and would imply other serious issues. Moreover,
the given checks would not work, and lead to additional confusion when
Cilium operates in CRD mode. Hence, let's just replace them with the
suggestion of checking whether both Cilium agents and KVStoreMesh
(if enabled) are correctly connected to all remote clusters, and the
synchronization has completed.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the pr/v1.13-backport-2024-05-16-11-35 branch from 5d97515 to 98e3b78 Compare May 16, 2024 15:52
@giorio94
Copy link
Member Author

/test-backport-1.13

@giorio94
Copy link
Member Author

giorio94 commented May 17, 2024

/test-1.18-4.19

Hit #30802

@giorio94 giorio94 marked this pull request as ready for review May 17, 2024 11:48
@giorio94 giorio94 requested a review from a team as a code owner May 17, 2024 11:48
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 23, 2024
@lmb lmb merged commit 6e43d0f into v1.13 May 23, 2024
165 checks passed
@lmb lmb deleted the pr/v1.13-backport-2024-05-16-11-35 branch May 23, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants