Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Natively support creating a TorchArrow column from a numpy array #179

Open
scotts opened this issue Feb 4, 2022 · 1 comment
Open

Natively support creating a TorchArrow column from a numpy array #179

scotts opened this issue Feb 4, 2022 · 1 comment

Comments

@scotts
Copy link
Contributor

scotts commented Feb 4, 2022

If users create a column from a Python list, we actually dispatch that directly to C++. For example,

vals = [1, 2, 3, 4, 5]
col = ta.Column(vals, device="cpu")

We dispatch that directly to C++ through pybind11:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/csrc/velox/lib.cpp#L135-L141
However, if a user creates a column from a numpy array, we currently have to handle that (slowly) in Python. For example,

vals = [1, 2, 3, 4, 5]
arr = numpy.array(vals)
col = ta.Colmun(arr, device="cpu")

That will be handled only on the Python side:
https://github.com/facebookresearch/torcharrow/blob/d680bfdc0f6a6bb6c3a29c2a67d62006782d6558/torcharrow/scope.py#L226-L233
We should be able to handle numpy arrays natively in C++; pybind11 already exposes a numpy array type.

@wenleix
Copy link
Contributor

wenleix commented Feb 4, 2022

Here is the original from_numpy API prototype: https://github.com/facebookresearch/torcharrow/blob/95daa1fabd5a3098be112d487e085e13f5447786/torcharrow/_interop.py#L88-L100

But i don't think we have supported natively in CPU backend (only in the "demo" backend where data is stored as numpy array -- removed in #33)

Some API reference:

YLGH pushed a commit to YLGH/torcharrow that referenced this issue May 7, 2022
Summary:
Pull Request resolved: pytorch/torchrec#179

* add the `expand_into_jagged_permute` GPU kernel callsite for generating 1D sparse data permute

Reviewed By: youyou6093

Differential Revision: D34778094

fbshipit-source-id: d14174cea809f3e33b1d860d297c7d318a930e34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants