[benchmark] optimize benchmark: counting tokenlizer tokens and error requests #1607

NiuBlibing · 2024-05-17T04:08:18Z

Motivation

support https shceme for benchmark
counting the real output tokens
counting local tokenlizer throughput
counting error requests
support setting role in prompt
support benchmark openai API

BC-breaking (Optional)

None

Use cases (Optional)

For better benchmark.

lvhan028 · 2024-05-17T04:39:16Z

benchmark/profile_restful_api.py

                timestamps.append(time.perf_counter())

            first_token_latency = np.round(timestamps[1] - timestamps[0], 3)
            token_latency = np.round(timestamps[-1] - timestamps[0], 3)
            # assert output.pop('finish_reason') == 'length', \
            #     f'Error. session_id({session_id}) request {output_seqlen} ' \
            #     f'tokens, but `finish_reason` is not `length`'
-            total_tokens = input_seqlen + output_seqlen
+            tokenlizer_start = time.perf_counter()
+            real_output_seqlen = len(self.tokenizer(full_output).input_ids)


Encoding the text affects the inference performance.
I don't suggest doing that.
If the output seqlen is needed, the server can return it.

In stream mode, this statistic is not returned.

In my tests(Qwen-72B-Chat), the tokenlizer takes only 0.027% of the whole benchmark elapsed time(tokenlizer speed: 77402.264 token/s for one concurrency), and the tokenlizer time has been removed in the final stats code.

stats = np.concatenate(stats).reshape(-1, 6) tokenlizer_time = np.sum(stats[:, 5], axis=0) / concurrency elapsed_time -= tokenlizer_time

Encoding the text affects the inference performance. I don't suggest doing that. If the output seqlen is needed, the server can return it.

lmdeploy doesn't support stream_options to get stats from server in stream mode yet.

lvhan028 · 2024-05-17T08:33:33Z

what kind of errors?

NiuBlibing · 2024-05-17T08:42:41Z

what kind of errors?

Such as oom, account limits, etc.

NiuBlibing added 6 commits May 17, 2024 10:07

[benchmark] support https scheme server url

f57a621

[benchmark] calculate the real output tokens

8d6b95b

[benchmark] calculate local tokenlizer time

1740b5a

[benchmark] fix linting

3acbf63

[benchmark] fix linting

b3ee429

[benchmark] fix linting

7d719f1

lvhan028 reviewed May 17, 2024

View reviewed changes

[benchmark] counting error requests

0cf60fc

NiuBlibing changed the title ~~[benchmark] optimize counting of output tokens~~ [benchmark] optimize benchmark: counting tokenlizer tokens and error requests May 17, 2024

NiuBlibing added 8 commits May 17, 2024 17:45

[benchmark] support role in prompt

32ba3ba

[benchmark] deal with last empty delta in stream mode

6bc0dcf

[benchmark] fix linting

ef5988c

[benchmark] fix linting

52a2e66

[benchmark] fix linting

080cb23

[benchmark] support openai API

fcec460

[benchmark] fix linting

9becb97

[benchmark] fix linting

2768d82

NiuBlibing closed this May 20, 2024

NiuBlibing reopened this May 20, 2024

NiuBlibing mentioned this pull request May 21, 2024

[Bug] hang when many requests #1619

Open

2 tasks

NiuBlibing closed this May 21, 2024

NiuBlibing reopened this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] optimize benchmark: counting tokenlizer tokens and error requests #1607

[benchmark] optimize benchmark: counting tokenlizer tokens and error requests #1607

NiuBlibing commented May 17, 2024 •

edited

lvhan028 May 17, 2024

NiuBlibing May 17, 2024

NiuBlibing May 17, 2024

lvhan028 commented May 17, 2024

NiuBlibing commented May 17, 2024

[benchmark] optimize benchmark: counting tokenlizer tokens and error requests #1607

Are you sure you want to change the base?

[benchmark] optimize benchmark: counting tokenlizer tokens and error requests #1607

Conversation

NiuBlibing commented May 17, 2024 • edited

Motivation

BC-breaking (Optional)

Use cases (Optional)

lvhan028 May 17, 2024

Choose a reason for hiding this comment

NiuBlibing May 17, 2024

Choose a reason for hiding this comment

NiuBlibing May 17, 2024

Choose a reason for hiding this comment

lvhan028 commented May 17, 2024

NiuBlibing commented May 17, 2024

NiuBlibing commented May 17, 2024 •

edited