Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add TLS support to gnet #435

Open
wants to merge 47 commits into
base: dev
Choose a base branch
from
Open

feat: Add TLS support to gnet #435

wants to merge 47 commits into from

Conversation

0-haha
Copy link
Contributor

@0-haha 0-haha commented Jan 27, 2023

1. Are you opening this pull request for bug-fixes, optimizations or new feature?

new feature

2. Please describe how these code changes achieve your intention.

This PR is to add the TLS support to get.

Change of the source code

  • The main TLS library is packaged in the directory pkg/tls/
  • internal/boring is the dummy package for boring TLS as it is required by the standard Golang TLS library
  • Other changes go to acceptor.go, connection.go, and eventloop.go. Basically, once the TLS enabled on the server side,
    it will call gnetConn.UpgradeTLS() to upgrade the protocol to TLS. Then, all reads and writes will go to the gnetConn.readTLS() and gnetConn.writeTLS()

The gnet TLS implementation

  • is developed based on https://github.com/luyu6056/tls.
  • merges the upstream go v1.20rc3 standard TLS library.
    • Since go 1.20 uses crypto/ecdh in crypto/tls/key_agreement.go, which is not available in go <= 1.19, go.mod is bumped up to 1.20.
  • adds the Kernel TLS support. So one can offload the encryption to the kernel.
    • The implementation of kernel TLS was based on @jim3m's implementation, which was based on @FiloSottile's implementation.
    • Kernel TLS is totally depending on the kernel version.
    • Kernel TLS Features
      • KTLS 1.2 TX & RX
      • KTLS 1.3 TX & RX
      • zerocopy and no pad for TLS 1.3
      • ciphersuites: AES-GCM-128, AES-GCM-256, CHACHA20POLY1305
    • Kernel TLS TODO
      • KTLS 1.3 RX disabled on kernel < 5.19 as it causes weird package lost
      • zero copy and no pad have not been tested yet. zero copy is enabled on kernel >= 5.19, and no pad is enabled on kernel >= 6.0.
      • sendfile api

Examples

An example of using gnet TLS can found at https://github.com/0-haha/gnet_tls_examples. You should be able to run the repo in docker.

Other Comments

I would say the majority of the implementation has been completed. Open to ongoing conversations.
中文可以聊。

3. Please link to the relevant issues (if any).

Fixes #16

4. Which documentation changes (if any) need to be made/updated because of this PR?

The gnet.Run command will accept the tls.Config like this:

cer, _ := tls.LoadX509KeyPair("server.crt", "server.key")
// server only uses TLS 1.2 and TLS 1.3
config := &tls.Config{
    MinVersion:   tls.VersionTLS12,
    Certificates: []tls.Certificate{cer},
}
gnet.Run(echo, "tcp://192.168.0.100:8000", gnet.WithTLS(config))

4. Checklist

  • [ x] I have squashed all insignificant commits.
  • [ x] I have commented my code for explaining package types, values, functions, and non-obvious lines.
  • I have written unit tests and verified that all tests passes (if needed).
  • I have documented feature info on the README (only when this PR is adding a new feature).
  • [ x] (optional) I am willing to help maintain this change if there are issues with it later.

2. change the gnet API name for the TLS server & client
3. gnet TLS write returns the exact number of bytes
    written to the socket rather than the lenght of data.
to MsgBuffer so that the tls conn not longer holds the actual buffer
when the connection is idle.

Other updates:
1. add defaultSize in MsgBuffer
2. fix the condition to clean up the buffer
     (i > blockSize to i >= blockSize)
1. The kernel TLS implementation is based on
     https://github.com/jim3ma/go.git
     branch: dev.ktls.1.16.3
2. Supports: TLS1.2 & TLS 1.3
3. Supported cipher suites:
     AES_128_GCM_SHA256
     AES_256_GCM_SHA384
     CHACHA20_POLY1305_SHA256
4. Server side has been tested and it works.
    Client side needs to be tested later
5. TODO: add sendfile(), TLS_TX_ZEROCOPY_RO (device offload),
    and TLS_RX_EXPECT_NO_PAD. (See
    https://docs.kernel.org/networking/tls.html#optional-optimizations)
    for details.
data should use the local declaration rather than re-declaring
in the if statement, which results len(data) is 0 on line 794,
resulting EOF.
=======================================
1. disable kTLS 1.3 RX on kernel 5.15
2. check zero copy on kernel 5.19
3. check tls 1.3 no pad on kernel 6.0
======================================
1. TLS writes the data into the socket directly rather than writing the
    data into the buffer. the data is buffered only if error unix.EAGAIN
    occurs.
2. Add "tlsEnabled bool" to control when to use tlsconn.Write(). The
    reason is that tlsconn.Write() encrypt the data, then calls
    gnetConn.Write() which could potently call either gnetConn.write()
    or gnetConn.writeTLS(). Therefore, we make "tlsEnabled" to false
    before calling tlsconn.Write(), and then restore "tlsEnabled" to
    true after that.
3. tlsconn.flush() calls gnetConn.Flush() to flush the buffer
    immediately. Therefore, we don't need to call gnetConn.Flush() in
    gnet TLS handshake phase as tlsconn.Handshake() calls
    gnetConn.Flush() implicitly.
@0-haha 0-haha marked this pull request as ready for review January 27, 2023 23:13
@panjf2000
Copy link
Owner

Thank you for implementing TLS and opening this PR.

This might cost me a lot of time to absorb and review the code, but I'll do this as fast as possible.

========================================
Redesign the buffer in gnet TLS implementation
to achieve zero-copy.

Background:
- tlsconn.rawInput: raw input from TCP to hold the TLS record
- tlsconn.input: buffer to hold decrypted TLS record
- tlsconn.hand: buffer to hold handshake data
- tlsconn.sendBuf: buffer to hold sending data

Problems:
- Memory copy in TLS read:
  In the previous implementation, tlsconn.input refers
  to the gnetConn.inboundBuffer. To decrypted, we copy
  el.buffer to tlsconn.rawInput. The TLS connection,
  write the decrypted data to tlsconn.input, which is
  gnetConn.inboundBuffer. When el.eventHandler.OnTraffic()
  is triggered, gnetConn.Next() and gnet.Conn.Peek()
  can trigger more data copy as it can write to c.loop.cache()
- Memory copy in TLS write:
  In the previous implementation, all encrypted data are
  first written to tlsconn.sendBuf, which refers to
  gnetConn.outboundBuffer. Then, tlsconn.Write() calls
  gnetConn.Write() which flushes the buffer to the socket

New implementation:
We designed LazyBuffer (lb) which has a buf []byte and its
reference ref *[]byte. In the lazy mode, lb.ref is always nil,
lb.buf is readonly. When calling lb.Write(), lb request a buffer
from the sync.Pool, and copies lb.buf to the new buffer.
Both lb.buf and lb.ref point to the new buffer.
- New TLS read:
  With LazyBuffer, we let tlsconn.rawInput refer to el.buffer.
  Decrypted data stores in tlsconn.rawInput as well. tlsconn.Data()
  returns the reference of all decrypted data, and will be
  assigned to gnetConn.buffer.
- New TLS write:
  tlsconn.Write() first encrypts the data, then calls
  gnetConn.WriteTCP() which directly writes the data
  to the socket.
- New TLS handshake:
  we restore the tlsconn.Buffering flag which is only used in
  the handshake. Incoming handshake data is stored
  in tlsconn.hand and will be discarded immediately
  after being used. Outgoing handshake data is buffered
  in tlsconn.sendBuf, and will be flushed after calling
  tlsconn.flush() which calls gnetConn.WriteTCP() which
  directly writes the data to the socket.
@0-haha
Copy link
Contributor Author

0-haha commented Jan 31, 2023

I optimized the memory copy and buffer usage for TLS read, write, and handshake. The following briefly describes the implementation idea which would be helpful to review the code

Buffers used in TLS

  • rawInput: stores the TLS record from TCP
  • input: stores the decrypted TLS record, and the memory is owned by rawInput
  • hand: stores the handshake data
  • sendBuf: stores the sending data, and it is only used during the handshake

TLS handshake (starts in eventloop.read()):

  1. Attach eventloop.buffer to tlsconn.rawInput (zero-copy)
  2. Call the tlsconn.handshake()
  3. If tlsconn.HandshakeComplete(), call eventloop.readTLS(); Otherwise, return and will restart at 1 in the next round

Note: In tlsconn.handshake(), it extracts the handshake message from tlsconn.rawInput and zero-copys the message to tls.hand.
tls.hand discards the messages immediately after using it.

TLS read:

  1. Attach eventloop.buffer to tlsconn.rawInput (zero-copy)
  2. Call eventloop.readTLS(). Since tlsconn.rawInput can holds multiple TLS records, we iteratively process all TLS records.
    for {
        Extract the TLS record from "tlsconn.rawInput"
        Decrypt the record into "tlsconn.rawInput"
        tlsconn.data = decrypted record (store the reference, zero-copy)
        gnetConn.buffer = tlsconn.Data() (return tlsconn.data, zero-copy)
        eventHandler.OnTraffic()
        c.inboundBuffer.Write(c.buffer)
        if no more TLS records in "tlsconn.rawInput" {
            discard the data in "tlsconn.rawInput" and "tlsconn.data"
            if there is data left in "tlsconn.rawInput", we cache it. so, the left data is not owned by "eventloop.buffer" which will be used by another "gnetConn"
        }
    }
    
    The data pipeline in each iteration is
    eventloop.buffer (TLS records from TCP) -> (zero-copy) -> tlsconn.rawInput -> (decrypt TLS records) -> tlsconn.rawInput -> (zero-copy) -> tlsconn.Data
    -> (zero-copy) -> gnetConn.buffer -> (used by eventHandler.OnTraffic()) -> (write rest into to) -> c.inboundBuffer
  • Kernel TLS read pipeline
    eventloop.buffer -> (attach to, zero-copy) -> tlsconn.rawInput
    ktlsReadRecord -> (decrypt) -> tlsconn.rawInput (referring to eventloop.buffer) -> (zero-copy) -> tlsconn.Data

    The rest follow the standard gnet TLS read.

TLS write:

The data pipeline is
data -> (encrypt) -> buffer from sync.Pool -> (call) -> gnetConn.writeTCP() -> (call) -> gnetConn.write() (standard gnet write function) -> return buffer tosync.Pool

  • Kernel TLS write
    call gnetConn.write(data). When gnetConn writes the data into the socket (call unix.Write()), kernel encrypts the data automatically.

0-haha and others added 3 commits February 5, 2023 10:48
Message marshalling makes use of BytesOrPanic a lot, under the
assumption that it will never panic. This assumption was incorrect, and
specifically crafted handshakes could trigger panics. Rather than just
surgically replacing the usages of BytesOrPanic in paths that could
panic, replace all usages of it with proper error returns in case there
are other ways of triggering panics which we didn't find.

In one specific case, the tree routed by expandLabel, we replace the
usage of BytesOrPanic, but retain a panic. This function already
explicitly panicked elsewhere, and returning an error from it becomes
rather painful because it requires changing a large number of APIs.
The marshalling is unlikely to ever panic, as the inputs are all either
fixed length, or already limited to the sizes required. If it were to
panic, it'd likely only be during development. A close inspection shows
no paths for a user to cause a panic currently.

This patches ends up being rather large, since it requires routing
errors back through functions which previously had no error returns.
Where possible I've tried to use helpers that reduce the verbosity
of frequently repeated stanzas, and to make the diffs as minimal as
possible.

Thanks to Marten Seemann for reporting this issue.

Updates #58001
Fixes #58359
Fixes CVE-2022-41724

Change-Id: Ieb55867ef0a3e1e867b33f09421932510cb58851
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1679436
Reviewed-by: Julie Qiu <julieqiu@google.com>
TryBot-Result: Security TryBots <security-trybots@go-security-trybots.iam.gserviceaccount.com>
Run-TryBot: Roland Shoemaker <bracewell@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
(cherry picked from commit 1d4e6ca9454f6cf81d30c5361146fb5988f1b5f6)
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1728205
Reviewed-by: Tatiana Bradley <tatianabradley@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/468121
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Bypass: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
connection.go Outdated
// tlsconn will call gnet.WriteTCP() to sent the data directly.
// If gnetConn.outboundBufferis not empty, data will be
// buffered in gnetConn.outboundBuffer.
n, err = c.tlsconn.Write(data)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use outboundBuffer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为 tlsconn.Write 会透明加密 data,然后调用gnetConn.WriteTCP() -> gnetConn.write().

这样就可以直接往socket写数据,而不是copy到outboundbuffer,然后再写数据,性能更好。

在kernel tls开启的时候,可以做到zero-copy,因为内核直接透明加密。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call stack
c.tlsconn.Write -> gnetConn.WriteTCP() -> gnetConn.write()

gnetConn.write() 会透明管理 outboundbuffer

connection.go Outdated Show resolved Hide resolved
connection.go Outdated Show resolved Hide resolved
connection.go Outdated

// Expose the plaintext write API which should only be used
// by tlsconn.Write().
func (c *conn) WriteTCP(p []byte) (int, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this new method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tlsconn.Write() 内部最终会向 gnetConn 写数据,但是gnetConn 内部又有tlsconn 的指针,所以就会有一个死循环。
参照 https://github.com/0-haha/gnet_go_tls/blob/5728fd829790624e21452e217702b9c9964ded45/conn.go#L920-L940

因为tlsconn.Write()是把加密后的数据写到gnetconn,所以需要gnetConn暴露只写明文的API就是gnetConn.write()

connection.go Outdated Show resolved Hide resolved
eventloop.go Outdated Show resolved Hide resolved
go.mod Outdated
@@ -1,11 +1,12 @@
module github.com/panjf2000/gnet/v2

require (
github.com/0-haha/gnet_go_tls/v120 v120.2.0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么版本号不是 v1.20.2 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

版本号用 v1.20.2 go mod tidy 会报错

go: errors parsing go.mod:
/workspace/go.mod:4:2: require github.com/0-haha/gnet_go_tls/v120: version "v1.20.2" invalid: should be v120, not v1

为什么用v120 ?

go mod 不支持v1.x, 必须是int。为了支持多版本go,比如未来的v1.21,所以go 1.20的 go mod 链接用 github.com/0-haha/gnet_go_tls/v120

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这种 go mod 版本号的用法太奇怪了,我理解并不需要在 go.mod 里加上 v120,后面的版本号通过 github release 和 tag 来管理就行了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是针对go 1.20的实现。
https://github.com/0-haha/gnet/blob/dev/pkg/tls/go120.go

针对不同go 版本用go:build 来区分编译的包

go-quic 是直接每个不同的go 版本一个git repo。你看一下他的import就知道了。
https://github.com/quic-go/quic-go/tree/master/internal/qtls

目前还没想到更好的办法在一个仓库里维护多个版本的办法

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panjf2000 go mod 里面是准备一个go版本一个TLS库吗? 维护就只能fork + cherry pick了。

require (
	github.com/0-haha/gnet_go_tls-1-20 v1.20.0
	github.com/0-haha/gnet_go_tls-1-21 v1.21.0
)

或者就是一个repo下两个文件夹,一个文件夹对应一个go版本

我就想到这两种方案,其他的试过了,但是使用的时候必须要在go mod里用replace ,直接跑go mod tidy 会报错。

你倾向哪个方案

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go.mod 里改成 github.com/0-haha/gnet-tls-go1-20 v1.20.2-rc.1

resolved in new commits

@panjf2000 panjf2000 modified the milestones: v2.3.0, Long term, v1.7.0 May 18, 2023
@Fgaoxing
Copy link

@panjf2000 请问gnet何时可以支持TLS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file new feature optimization Some small optimizations pending development Requested PR owner to improve code and waiting for the result working on it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS support?
5 participants