feat: Add TLS support to gnet #435

0-haha · 2023-01-27T22:34:40Z

1. Are you opening this pull request for bug-fixes, optimizations or new feature?

new feature

2. Please describe how these code changes achieve your intention.

This PR is to add the TLS support to get.

Change of the source code

The main TLS library is packaged in the directory pkg/tls/
internal/boring is the dummy package for boring TLS as it is required by the standard Golang TLS library
Other changes go to acceptor.go, connection.go, and eventloop.go. Basically, once the TLS enabled on the server side,
it will call gnetConn.UpgradeTLS() to upgrade the protocol to TLS. Then, all reads and writes will go to the gnetConn.readTLS() and gnetConn.writeTLS()

The gnet TLS implementation

is developed based on https://github.com/luyu6056/tls.
merges the upstream go v1.20rc3 standard TLS library.
- Since go 1.20 uses crypto/ecdh in crypto/tls/key_agreement.go, which is not available in go <= 1.19, go.mod is bumped up to 1.20.
adds the Kernel TLS support. So one can offload the encryption to the kernel.
- The implementation of kernel TLS was based on @jim3m's implementation, which was based on @FiloSottile's implementation.
- Kernel TLS is totally depending on the kernel version.
- Kernel TLS Features
  - KTLS 1.2 TX & RX
  - KTLS 1.3 TX & RX
  - zerocopy and no pad for TLS 1.3
  - ciphersuites: AES-GCM-128, AES-GCM-256, CHACHA20POLY1305
- Kernel TLS TODO
  - KTLS 1.3 RX disabled on kernel < 5.19 as it causes weird package lost
  - zero copy and no pad have not been tested yet. zero copy is enabled on kernel >= 5.19, and no pad is enabled on kernel >= 6.0.
  - sendfile api

Examples

An example of using gnet TLS can found at https://github.com/0-haha/gnet_tls_examples. You should be able to run the repo in docker.

Other Comments

I would say the majority of the implementation has been completed. Open to ongoing conversations.
中文可以聊。

3. Please link to the relevant issues (if any).

Fixes #16

4. Which documentation changes (if any) need to be made/updated because of this PR?

The gnet.Run command will accept the tls.Config like this:

cer, _ := tls.LoadX509KeyPair("server.crt", "server.key")
// server only uses TLS 1.2 and TLS 1.3
config := &tls.Config{
    MinVersion:   tls.VersionTLS12,
    Certificates: []tls.Certificate{cer},
}
gnet.Run(echo, "tcp://192.168.0.100:8000", gnet.WithTLS(config))

4. Checklist

[ x] I have squashed all insignificant commits.
[ x] I have commented my code for explaining package types, values, functions, and non-obvious lines.
I have written unit tests and verified that all tests passes (if needed).
I have documented feature info on the README (only when this PR is adding a new feature).
[ x] (optional) I am willing to help maintain this change if there are issues with it later.

2. change the gnet API name for the TLS server & client 3. gnet TLS write returns the exact number of bytes written to the socket rather than the lenght of data.

to MsgBuffer so that the tls conn not longer holds the actual buffer when the connection is idle. Other updates: 1. add defaultSize in MsgBuffer 2. fix the condition to clean up the buffer (i > blockSize to i >= blockSize)

1. The kernel TLS implementation is based on https://github.com/jim3ma/go.git branch: dev.ktls.1.16.3 2. Supports: TLS1.2 & TLS 1.3 3. Supported cipher suites: AES_128_GCM_SHA256 AES_256_GCM_SHA384 CHACHA20_POLY1305_SHA256 4. Server side has been tested and it works. Client side needs to be tested later 5. TODO: add sendfile(), TLS_TX_ZEROCOPY_RO (device offload), and TLS_RX_EXPECT_NO_PAD. (See https://docs.kernel.org/networking/tls.html#optional-optimizations) for details.

but not tested yet

data should use the local declaration rather than re-declaring in the if statement, which results len(data) is 0 on line 794, resulting EOF.

======================================= 1. disable kTLS 1.3 RX on kernel 5.15 2. check zero copy on kernel 5.19 3. check tls 1.3 no pad on kernel 6.0

====================================== 1. TLS writes the data into the socket directly rather than writing the data into the buffer. the data is buffered only if error unix.EAGAIN occurs. 2. Add "tlsEnabled bool" to control when to use tlsconn.Write(). The reason is that tlsconn.Write() encrypt the data, then calls gnetConn.Write() which could potently call either gnetConn.write() or gnetConn.writeTLS(). Therefore, we make "tlsEnabled" to false before calling tlsconn.Write(), and then restore "tlsEnabled" to true after that. 3. tlsconn.flush() calls gnetConn.Flush() to flush the buffer immediately. Therefore, we don't need to call gnetConn.Flush() in gnet TLS handshake phase as tlsconn.Handshake() calls gnetConn.Flush() implicitly.

panjf2000 · 2023-01-28T15:25:02Z

Thank you for implementing TLS and opening this PR.

This might cost me a lot of time to absorb and review the code, but I'll do this as fast as possible.

======================================== Redesign the buffer in gnet TLS implementation to achieve zero-copy. Background: - tlsconn.rawInput: raw input from TCP to hold the TLS record - tlsconn.input: buffer to hold decrypted TLS record - tlsconn.hand: buffer to hold handshake data - tlsconn.sendBuf: buffer to hold sending data Problems: - Memory copy in TLS read: In the previous implementation, tlsconn.input refers to the gnetConn.inboundBuffer. To decrypted, we copy el.buffer to tlsconn.rawInput. The TLS connection, write the decrypted data to tlsconn.input, which is gnetConn.inboundBuffer. When el.eventHandler.OnTraffic() is triggered, gnetConn.Next() and gnet.Conn.Peek() can trigger more data copy as it can write to c.loop.cache() - Memory copy in TLS write: In the previous implementation, all encrypted data are first written to tlsconn.sendBuf, which refers to gnetConn.outboundBuffer. Then, tlsconn.Write() calls gnetConn.Write() which flushes the buffer to the socket New implementation: We designed LazyBuffer (lb) which has a buf []byte and its reference ref *[]byte. In the lazy mode, lb.ref is always nil, lb.buf is readonly. When calling lb.Write(), lb request a buffer from the sync.Pool, and copies lb.buf to the new buffer. Both lb.buf and lb.ref point to the new buffer. - New TLS read: With LazyBuffer, we let tlsconn.rawInput refer to el.buffer. Decrypted data stores in tlsconn.rawInput as well. tlsconn.Data() returns the reference of all decrypted data, and will be assigned to gnetConn.buffer. - New TLS write: tlsconn.Write() first encrypts the data, then calls gnetConn.WriteTCP() which directly writes the data to the socket. - New TLS handshake: we restore the tlsconn.Buffering flag which is only used in the handshake. Incoming handshake data is stored in tlsconn.hand and will be discarded immediately after being used. Outgoing handshake data is buffered in tlsconn.sendBuf, and will be flushed after calling tlsconn.flush() which calls gnetConn.WriteTCP() which directly writes the data to the socket.

…s/unix

0-haha · 2023-01-31T04:10:26Z

I optimized the memory copy and buffer usage for TLS read, write, and handshake. The following briefly describes the implementation idea which would be helpful to review the code

Buffers used in TLS

rawInput: stores the TLS record from TCP
input: stores the decrypted TLS record, and the memory is owned by rawInput
hand: stores the handshake data
sendBuf: stores the sending data, and it is only used during the handshake

TLS handshake (starts in `eventloop.read()`):

Attach eventloop.buffer to tlsconn.rawInput (zero-copy)
Call the tlsconn.handshake()
If tlsconn.HandshakeComplete(), call eventloop.readTLS(); Otherwise, return and will restart at 1 in the next round

Note: In tlsconn.handshake(), it extracts the handshake message from tlsconn.rawInput and zero-copys the message to tls.hand.
tls.hand discards the messages immediately after using it.

TLS read:

Attach eventloop.buffer to tlsconn.rawInput (zero-copy)

Call eventloop.readTLS(). Since tlsconn.rawInput can holds multiple TLS records, we iteratively process all TLS records.

for {
    Extract the TLS record from "tlsconn.rawInput"
    Decrypt the record into "tlsconn.rawInput"
    tlsconn.data = decrypted record (store the reference, zero-copy)
    gnetConn.buffer = tlsconn.Data() (return tlsconn.data, zero-copy)
    eventHandler.OnTraffic()
    c.inboundBuffer.Write(c.buffer)
    if no more TLS records in "tlsconn.rawInput" {
        discard the data in "tlsconn.rawInput" and "tlsconn.data"
        if there is data left in "tlsconn.rawInput", we cache it. so, the left data is not owned by "eventloop.buffer" which will be used by another "gnetConn"
    }
}

The data pipeline in each iteration is
eventloop.buffer (TLS records from TCP) -> (zero-copy) -> tlsconn.rawInput -> (decrypt TLS records) -> tlsconn.rawInput -> (zero-copy) -> tlsconn.Data
-> (zero-copy) -> gnetConn.buffer -> (used by eventHandler.OnTraffic()) -> (write rest into to) -> c.inboundBuffer

Kernel TLS read pipeline
eventloop.buffer -> (attach to, zero-copy) -> tlsconn.rawInput
ktlsReadRecord -> (decrypt) -> tlsconn.rawInput (referring to eventloop.buffer) -> (zero-copy) -> tlsconn.Data

The rest follow the standard gnet TLS read.

TLS write:

The data pipeline is
data -> (encrypt) -> buffer from sync.Pool -> (call) -> gnetConn.writeTCP() -> (call) -> gnetConn.write() (standard gnet write function) -> return buffer tosync.Pool

Kernel TLS write
call gnetConn.write(data). When gnetConn writes the data into the socket (call unix.Write()), kernel encrypts the data automatically.

Message marshalling makes use of BytesOrPanic a lot, under the assumption that it will never panic. This assumption was incorrect, and specifically crafted handshakes could trigger panics. Rather than just surgically replacing the usages of BytesOrPanic in paths that could panic, replace all usages of it with proper error returns in case there are other ways of triggering panics which we didn't find. In one specific case, the tree routed by expandLabel, we replace the usage of BytesOrPanic, but retain a panic. This function already explicitly panicked elsewhere, and returning an error from it becomes rather painful because it requires changing a large number of APIs. The marshalling is unlikely to ever panic, as the inputs are all either fixed length, or already limited to the sizes required. If it were to panic, it'd likely only be during development. A close inspection shows no paths for a user to cause a panic currently. This patches ends up being rather large, since it requires routing errors back through functions which previously had no error returns. Where possible I've tried to use helpers that reduce the verbosity of frequently repeated stanzas, and to make the diffs as minimal as possible. Thanks to Marten Seemann for reporting this issue. Updates #58001 Fixes #58359 Fixes CVE-2022-41724 Change-Id: Ieb55867ef0a3e1e867b33f09421932510cb58851 Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1679436 Reviewed-by: Julie Qiu <julieqiu@google.com> TryBot-Result: Security TryBots <security-trybots@go-security-trybots.iam.gserviceaccount.com> Run-TryBot: Roland Shoemaker <bracewell@google.com> Reviewed-by: Damien Neil <dneil@google.com> (cherry picked from commit 1d4e6ca9454f6cf81d30c5361146fb5988f1b5f6) Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1728205 Reviewed-by: Tatiana Bradley <tatianabradley@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/468121 Reviewed-by: Than McIntosh <thanm@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Bypass: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>

panjf2000 · 2023-04-05T02:48:45Z

connection.go

+	// tlsconn will call gnet.WriteTCP() to sent the data directly.
+	// If gnetConn.outboundBufferis not empty, data will be
+	// buffered in gnetConn.outboundBuffer.
+	n, err = c.tlsconn.Write(data)


Why don't you use outboundBuffer here?

因为 tlsconn.Write 会透明加密 data，然后调用gnetConn.WriteTCP() -> gnetConn.write().

这样就可以直接往socket写数据，而不是copy到outboundbuffer，然后再写数据，性能更好。

在kernel tls开启的时候，可以做到zero-copy，因为内核直接透明加密。

call stack
c.tlsconn.Write -> gnetConn.WriteTCP() -> gnetConn.write()

gnetConn.write() 会透明管理 outboundbuffer

connection.go

panjf2000 · 2023-04-05T02:55:31Z

connection.go

+
+// Expose the plaintext write API which should only be used
+// by tlsconn.Write().
+func (c *conn) WriteTCP(p []byte) (int, error) {


Why do we need this new method?

tlsconn.Write() 内部最终会向 gnetConn 写数据，但是gnetConn 内部又有tlsconn 的指针，所以就会有一个死循环。
参照 https://github.com/0-haha/gnet_go_tls/blob/5728fd829790624e21452e217702b9c9964ded45/conn.go#L920-L940

因为tlsconn.Write()是把加密后的数据写到gnetconn，所以需要gnetConn暴露只写明文的API就是gnetConn.write()

connection.go

eventloop.go

panjf2000 · 2023-04-05T02:56:22Z

go.mod

@@ -1,11 +1,12 @@
 module github.com/panjf2000/gnet/v2

 require (
+	github.com/0-haha/gnet_go_tls/v120 v120.2.0


为什么版本号不是 v1.20.2 ？

版本号用 v1.20.2 go mod tidy 会报错

go: errors parsing go.mod: /workspace/go.mod:4:2: require github.com/0-haha/gnet_go_tls/v120: version "v1.20.2" invalid: should be v120, not v1

为什么用v120 ？

go mod 不支持v1.x, 必须是int。为了支持多版本go，比如未来的v1.21，所以go 1.20的 go mod 链接用 github.com/0-haha/gnet_go_tls/v120

你这种 go mod 版本号的用法太奇怪了，我理解并不需要在 go.mod 里加上 v120，后面的版本号通过 github release 和 tag 来管理就行了。

这个是针对go 1.20的实现。
https://github.com/0-haha/gnet/blob/dev/pkg/tls/go120.go

针对不同go 版本用go:build 来区分编译的包

go-quic 是直接每个不同的go 版本一个git repo。你看一下他的import就知道了。
https://github.com/quic-go/quic-go/tree/master/internal/qtls

目前还没想到更好的办法在一个仓库里维护多个版本的办法

@panjf2000 go mod 里面是准备一个go版本一个TLS库吗? 维护就只能fork + cherry pick了。

require ( github.com/0-haha/gnet_go_tls-1-20 v1.20.0 github.com/0-haha/gnet_go_tls-1-21 v1.21.0 )

或者就是一个repo下两个文件夹，一个文件夹对应一个go版本

我就想到这两种方案，其他的试过了，但是使用的时候必须要在go mod里用replace ，直接跑go mod tidy 会报错。

你倾向哪个方案

go.mod 里改成 github.com/0-haha/gnet-tls-go1-20 v1.20.2-rc.1

resolved in new commits

…0-dev

Fgaoxing · 2023-10-22T06:43:03Z

@panjf2000 请问gnet何时可以支持TLS

0-haha added 17 commits January 25, 2023 23:37

Initial commit for TLS implementation. working on the server side.

697b56a

1. merge tls to go 1.20rc3 as close as possible

2e073d2

2. change the gnet API name for the TLS server & client 3. gnet TLS write returns the exact number of bytes written to the socket rather than the lenght of data.

delete unsed file internal/boring/rand.go

fe87eeb

Memory optimization: add the elastic wrapper EMsgBuffer

7c5336a

to MsgBuffer so that the tls conn not longer holds the actual buffer when the connection is idle. Other updates: 1. add defaultSize in MsgBuffer 2. fix the condition to clean up the buffer (i > blockSize to i >= blockSize)

Fix typos

40e9536

bug: fix type not matching in ktlsInBufPool.Get and Put

c7d0993

Add supports to TLS_TX_ZEROCOPY_RO and TLS_RX_EXPECT_NO_PAD,

582f146

but not tested yet

bug: Fix KTLS readRecordOrCCS return EOF

29768bc

data should use the local declaration rather than re-declaring in the if statement, which results len(data) is 0 on line 794, resulting EOF.

change int(fd) to fd as fd is already an int.

ee43463

Bug: Fix kTLS 1.3 RX not working on kernel 5.15

8e71e26

======================================= 1. disable kTLS 1.3 RX on kernel 5.15 2. check zero copy on kernel 5.19 3. check tls 1.3 no pad on kernel 6.0

comment out dead code

3e95281

update go version to 1.20

af39088

TLS: optimize checking if sendBuf is empty or not

492f83e

opt: don't check kTLS supports if kTLS is disabled

43bf39f

opt: remove the dead code

76acc42

0-haha marked this pull request as ready for review January 27, 2023 23:13

panjf2000 added new feature working on it labels Jan 28, 2023

0-haha added 5 commits January 30, 2023 17:43

bug: Fix unix.EAGAIN error returned by TLS read.

d24fd00

opt: replace syscall (deprecated golang library) with golang.org/x/sy…

3f21522

…s/unix

opt: remove gnetConn.tlsEnabled & update the doc related to tlsconn

d13ead1

opt: remove unused MsgBuffer & EMsgBuffer

b1b7bc5

0-haha and others added 3 commits February 5, 2023 10:48

Merge branch 'panjf2000:dev' into dev

213300a

Fix: add missing ctx

2b05f32

panjf2000 requested changes Apr 5, 2023

View reviewed changes

0-haha added 4 commits April 5, 2023 03:41

fix: make comments to english

0ccefca

fix: typos in comments

37393e2

change package name gnet_go_tls/v120 to gnet-tls-go1-20

2705b62

change gnet-tls-go1-20 version v1.20.2-rc.1

bef64fa

panjf2000 mentioned this pull request May 12, 2023

[Question]: Does gnet support the tls? #457

Closed

3 tasks

panjf2000 modified the milestones: v2.3.0, Long term, v1.7.0 May 18, 2023

0-haha added 8 commits May 21, 2023 17:13

Merge branch 'dev' of https://github.com/panjf2000/gnet into panjf200…

25c4638

…0-dev

Fix bugs caused by merging conflicts

ccc7c28

Merge branch 'dev' of https://github.com/panjf2000/gnet into dev

d35e196

Merge branch 'dev' of https://github.com/panjf2000/gnet into panjf200…

9a79add

…0-dev

Merge branch 'panjf2000:dev' into dev

f6206bb

Merge branch 'panjf2000:dev' into dev

9b98998

Merge branch 'panjf2000:dev' into dev

d25b6ab

Merge branch 'dev' of https://github.com/panjf2000/gnet into panjf200…

9015fae

…0-dev

mostafa mentioned this pull request Sep 30, 2023

Refactor server and proxy gatewayd-io/gatewayd#343

Closed

4 tasks

fix the typo

18c311d

0-haha added 3 commits November 3, 2023 23:39

Merge branch 'panjf2000:dev' into dev

e174dc7

Merge branch 'panjf2000:dev' into dev

ecdf787

Merge branch 'dev' into dev

6191b85

panjf2000 force-pushed the dev branch 3 times, most recently from fa1dc24 to 5d1cf9e Compare March 10, 2024 04:32

Merge branch 'panjf2000:dev' into dev

d78adc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add TLS support to gnet #435

feat: Add TLS support to gnet #435

0-haha commented Jan 27, 2023 •

edited by panjf2000

panjf2000 commented Jan 28, 2023

0-haha commented Jan 31, 2023

panjf2000 Apr 5, 2023

0-haha Apr 5, 2023

0-haha Apr 5, 2023

panjf2000 Apr 5, 2023

0-haha Apr 5, 2023

panjf2000 Apr 5, 2023

0-haha Apr 5, 2023

panjf2000 Apr 5, 2023

0-haha Apr 5, 2023

0-haha Apr 5, 2023

0-haha Apr 5, 2023

Fgaoxing commented Oct 22, 2023

feat: Add TLS support to gnet #435

Are you sure you want to change the base?

feat: Add TLS support to gnet #435

Conversation

0-haha commented Jan 27, 2023 • edited by panjf2000

1. Are you opening this pull request for bug-fixes, optimizations or new feature?

2. Please describe how these code changes achieve your intention.

Change of the source code

The gnet TLS implementation

Examples

Other Comments

3. Please link to the relevant issues (if any).

4. Which documentation changes (if any) need to be made/updated because of this PR?

4. Checklist

panjf2000 commented Jan 28, 2023

0-haha commented Jan 31, 2023

Buffers used in TLS

TLS handshake (starts in eventloop.read()):

TLS read:

TLS write:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fgaoxing commented Oct 22, 2023

0-haha commented Jan 27, 2023 •

edited by panjf2000

TLS handshake (starts in `eventloop.read()`):