Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSRegression在两方情况下,每一方有多个物理计算节点,如何提高运行速度 #1259

Open
kmoker8s opened this issue Apr 18, 2024 · 1 comment
Assignees

Comments

@kmoker8s
Copy link

Issue Type

Bug

Source

binary

Secretflow Version

1.5.0.dev20240321

OS Platform and Distribution

CentOS Linux release 7.9.2009 (Core)

Python version

3.9

Bazel version

none

GCC/Compiler version

none

What happend and What you expected to happen.

在两方参与计算时,每一方都拥有多个物理计算节点。具体的用的secretflow的SSRegression来做逻辑回归,在这种配置下,希望提供配置或解决方案提高整个系统的运行速度

Reproduction code to reproduce the issue.

sf.shutdown()
sf.init(parties=['alice', 'bob'],address=alice_ip+':9394')
#sf.init(address='192.168.207.221:9394', cluster_config=cluster_config)
alice = sf.PYU('alice')
bob = sf.PYU('bob')
#carol = sf.PYU('carol')

spu_config = {
            'nodes': [
                {'party': 'alice', 'id': 'local:0', 'address': alice_ip + ':12945'},
                {'party': 'bob', 'id': 'local:1', 'address': bob_ip + ':12946'},
                # {'party': 'carol', 'id': 'local:2', 'address': '127.0.0.1:12347'},
            ],
            'runtime_config': {
                # SEMI2K support 2/3 PC, ABY3 only support 3PC, CHEETAH only support 2PC.
                # pls pay attention to size of nodes above. nodes size need match to PC setting.
                'protocol': spu.spu_pb2.SEMI2K,
                'field': spu.spu_pb2.FM128,
            },
        }
# SPU settings

my_spu = sf.SPU(spu_config)


# your code to run.
# init log

#logging.basicConfig(stream=sys.stdout, level=logging.INFO)    
#start = time.time()
timecollect.start()

train_vdf = v_read_csv(
    {alice: train_alice_path, bob: train_bob_path},
    keys="row_num",
    drop_keys="row_num",
    spu=my_spu,
    psi_protocl="ECDH_PSI_2PC"
)

test_vdf = v_read_csv(
    {alice: test_alice_path, bob: test_bob_path},
    keys="row_num",
    drop_keys="row_num",
    spu=my_spu,
    psi_protocl="ECDH_PSI_2PC"
)
# 初始化模型
lr_model = SSRegression(my_spu)

# 训练模型
lr_model.fit(
    x=X_train,
    y=y_train,
    epochs=epoch,
    learning_rate=learning_rate,
    batch_size=batch_size,
    sig_type='t3',
    reg_type='logistic',
    penalty='l2',
    l2_norm=l2_r,
    eps=0.0001
)
@da-niao-dan
Copy link
Member

SS Regression 的计算瓶颈在于网络而不是计算资源。8C单台机器的CPU也利用不满。
没有直接配置物理机机群而加速SS Regression的方法。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants