New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add TLS support to gnet #435
base: dev
Are you sure you want to change the base?
Conversation
2. change the gnet API name for the TLS server & client 3. gnet TLS write returns the exact number of bytes written to the socket rather than the lenght of data.
to MsgBuffer so that the tls conn not longer holds the actual buffer when the connection is idle. Other updates: 1. add defaultSize in MsgBuffer 2. fix the condition to clean up the buffer (i > blockSize to i >= blockSize)
1. The kernel TLS implementation is based on https://github.com/jim3ma/go.git branch: dev.ktls.1.16.3 2. Supports: TLS1.2 & TLS 1.3 3. Supported cipher suites: AES_128_GCM_SHA256 AES_256_GCM_SHA384 CHACHA20_POLY1305_SHA256 4. Server side has been tested and it works. Client side needs to be tested later 5. TODO: add sendfile(), TLS_TX_ZEROCOPY_RO (device offload), and TLS_RX_EXPECT_NO_PAD. (See https://docs.kernel.org/networking/tls.html#optional-optimizations) for details.
but not tested yet
data should use the local declaration rather than re-declaring in the if statement, which results len(data) is 0 on line 794, resulting EOF.
======================================= 1. disable kTLS 1.3 RX on kernel 5.15 2. check zero copy on kernel 5.19 3. check tls 1.3 no pad on kernel 6.0
====================================== 1. TLS writes the data into the socket directly rather than writing the data into the buffer. the data is buffered only if error unix.EAGAIN occurs. 2. Add "tlsEnabled bool" to control when to use tlsconn.Write(). The reason is that tlsconn.Write() encrypt the data, then calls gnetConn.Write() which could potently call either gnetConn.write() or gnetConn.writeTLS(). Therefore, we make "tlsEnabled" to false before calling tlsconn.Write(), and then restore "tlsEnabled" to true after that. 3. tlsconn.flush() calls gnetConn.Flush() to flush the buffer immediately. Therefore, we don't need to call gnetConn.Flush() in gnet TLS handshake phase as tlsconn.Handshake() calls gnetConn.Flush() implicitly.
Thank you for implementing TLS and opening this PR. This might cost me a lot of time to absorb and review the code, but I'll do this as fast as possible. |
======================================== Redesign the buffer in gnet TLS implementation to achieve zero-copy. Background: - tlsconn.rawInput: raw input from TCP to hold the TLS record - tlsconn.input: buffer to hold decrypted TLS record - tlsconn.hand: buffer to hold handshake data - tlsconn.sendBuf: buffer to hold sending data Problems: - Memory copy in TLS read: In the previous implementation, tlsconn.input refers to the gnetConn.inboundBuffer. To decrypted, we copy el.buffer to tlsconn.rawInput. The TLS connection, write the decrypted data to tlsconn.input, which is gnetConn.inboundBuffer. When el.eventHandler.OnTraffic() is triggered, gnetConn.Next() and gnet.Conn.Peek() can trigger more data copy as it can write to c.loop.cache() - Memory copy in TLS write: In the previous implementation, all encrypted data are first written to tlsconn.sendBuf, which refers to gnetConn.outboundBuffer. Then, tlsconn.Write() calls gnetConn.Write() which flushes the buffer to the socket New implementation: We designed LazyBuffer (lb) which has a buf []byte and its reference ref *[]byte. In the lazy mode, lb.ref is always nil, lb.buf is readonly. When calling lb.Write(), lb request a buffer from the sync.Pool, and copies lb.buf to the new buffer. Both lb.buf and lb.ref point to the new buffer. - New TLS read: With LazyBuffer, we let tlsconn.rawInput refer to el.buffer. Decrypted data stores in tlsconn.rawInput as well. tlsconn.Data() returns the reference of all decrypted data, and will be assigned to gnetConn.buffer. - New TLS write: tlsconn.Write() first encrypts the data, then calls gnetConn.WriteTCP() which directly writes the data to the socket. - New TLS handshake: we restore the tlsconn.Buffering flag which is only used in the handshake. Incoming handshake data is stored in tlsconn.hand and will be discarded immediately after being used. Outgoing handshake data is buffered in tlsconn.sendBuf, and will be flushed after calling tlsconn.flush() which calls gnetConn.WriteTCP() which directly writes the data to the socket.
I optimized the memory copy and buffer usage for TLS read, write, and handshake. The following briefly describes the implementation idea which would be helpful to review the code Buffers used in TLS
TLS handshake (starts in
|
Message marshalling makes use of BytesOrPanic a lot, under the assumption that it will never panic. This assumption was incorrect, and specifically crafted handshakes could trigger panics. Rather than just surgically replacing the usages of BytesOrPanic in paths that could panic, replace all usages of it with proper error returns in case there are other ways of triggering panics which we didn't find. In one specific case, the tree routed by expandLabel, we replace the usage of BytesOrPanic, but retain a panic. This function already explicitly panicked elsewhere, and returning an error from it becomes rather painful because it requires changing a large number of APIs. The marshalling is unlikely to ever panic, as the inputs are all either fixed length, or already limited to the sizes required. If it were to panic, it'd likely only be during development. A close inspection shows no paths for a user to cause a panic currently. This patches ends up being rather large, since it requires routing errors back through functions which previously had no error returns. Where possible I've tried to use helpers that reduce the verbosity of frequently repeated stanzas, and to make the diffs as minimal as possible. Thanks to Marten Seemann for reporting this issue. Updates #58001 Fixes #58359 Fixes CVE-2022-41724 Change-Id: Ieb55867ef0a3e1e867b33f09421932510cb58851 Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1679436 Reviewed-by: Julie Qiu <julieqiu@google.com> TryBot-Result: Security TryBots <security-trybots@go-security-trybots.iam.gserviceaccount.com> Run-TryBot: Roland Shoemaker <bracewell@google.com> Reviewed-by: Damien Neil <dneil@google.com> (cherry picked from commit 1d4e6ca9454f6cf81d30c5361146fb5988f1b5f6) Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1728205 Reviewed-by: Tatiana Bradley <tatianabradley@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/468121 Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Bypass: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>
connection.go
Outdated
// tlsconn will call gnet.WriteTCP() to sent the data directly. | ||
// If gnetConn.outboundBufferis not empty, data will be | ||
// buffered in gnetConn.outboundBuffer. | ||
n, err = c.tlsconn.Write(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use outboundBuffer
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为 tlsconn.Write 会透明加密 data,然后调用gnetConn.WriteTCP() -> gnetConn.write().
这样就可以直接往socket写数据,而不是copy到outboundbuffer,然后再写数据,性能更好。
在kernel tls开启的时候,可以做到zero-copy,因为内核直接透明加密。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call stack
c.tlsconn.Write
-> gnetConn.WriteTCP()
-> gnetConn.write()
gnetConn.write()
会透明管理 outboundbuffer
connection.go
Outdated
|
||
// Expose the plaintext write API which should only be used | ||
// by tlsconn.Write(). | ||
func (c *conn) WriteTCP(p []byte) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tlsconn.Write() 内部最终会向 gnetConn 写数据,但是gnetConn 内部又有tlsconn 的指针,所以就会有一个死循环。
参照 https://github.com/0-haha/gnet_go_tls/blob/5728fd829790624e21452e217702b9c9964ded45/conn.go#L920-L940
因为tlsconn.Write()是把加密后的数据写到gnetconn,所以需要gnetConn暴露只写明文的API就是gnetConn.write()
go.mod
Outdated
@@ -1,11 +1,12 @@ | |||
module github.com/panjf2000/gnet/v2 | |||
|
|||
require ( | |||
github.com/0-haha/gnet_go_tls/v120 v120.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么版本号不是 v1.20.2 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
版本号用 v1.20.2 go mod tidy
会报错
go: errors parsing go.mod:
/workspace/go.mod:4:2: require github.com/0-haha/gnet_go_tls/v120: version "v1.20.2" invalid: should be v120, not v1
为什么用v120 ?
go mod 不支持v1.x, 必须是int。为了支持多版本go,比如未来的v1.21,所以go 1.20的 go mod 链接用 github.com/0-haha/gnet_go_tls/v120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这种 go mod 版本号的用法太奇怪了,我理解并不需要在 go.mod 里加上 v120,后面的版本号通过 github release 和 tag 来管理就行了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是针对go 1.20的实现。
https://github.com/0-haha/gnet/blob/dev/pkg/tls/go120.go
针对不同go 版本用go:build 来区分编译的包
go-quic 是直接每个不同的go 版本一个git repo。你看一下他的import就知道了。
https://github.com/quic-go/quic-go/tree/master/internal/qtls
目前还没想到更好的办法在一个仓库里维护多个版本的办法
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@panjf2000 go mod 里面是准备一个go版本一个TLS库吗? 维护就只能fork + cherry pick了。
require (
github.com/0-haha/gnet_go_tls-1-20 v1.20.0
github.com/0-haha/gnet_go_tls-1-21 v1.21.0
)
或者就是一个repo下两个文件夹,一个文件夹对应一个go版本
我就想到这两种方案,其他的试过了,但是使用的时候必须要在go mod里用replace ,直接跑go mod tidy 会报错。
你倾向哪个方案
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go.mod 里改成 github.com/0-haha/gnet-tls-go1-20 v1.20.2-rc.1
resolved in new commits
@panjf2000 请问gnet何时可以支持TLS |
fa1dc24
to
5d1cf9e
Compare
1. Are you opening this pull request for bug-fixes, optimizations or new feature?
new feature
2. Please describe how these code changes achieve your intention.
This PR is to add the TLS support to get.
Change of the source code
pkg/tls/
internal/boring
is the dummy package forboring TLS
as it is required by the standard Golang TLS libraryacceptor.go
,connection.go
, andeventloop.go
. Basically, once the TLS enabled on the server side,it will call
gnetConn.UpgradeTLS()
to upgrade the protocol to TLS. Then, all reads and writes will go to thegnetConn.readTLS()
andgnetConn.writeTLS()
The gnet TLS implementation
crypto/ecdh
incrypto/tls/key_agreement.go
, which is not available ingo <= 1.19
,go.mod
is bumped up to1.20
.kernel >= 5.19
, and no pad is enabled onkernel >= 6.0
.Examples
An example of using gnet TLS can found at https://github.com/0-haha/gnet_tls_examples. You should be able to run the repo in docker.
Other Comments
I would say the majority of the implementation has been completed. Open to ongoing conversations.
中文可以聊。
3. Please link to the relevant issues (if any).
Fixes #16
4. Which documentation changes (if any) need to be made/updated because of this PR?
The
gnet.Run
command will accept thetls.Config
like this:4. Checklist