TensorShare
The TensorShare
schema is the main class of the project. It's used to share tensors between different backends.
This schema inherits from the pydantic.BaseModel
class and has two fields:
tensors
: a base64 encoded string of the serialized tensorssize
: the size of the tensors in bytes
Creating a TensorShare
object
After installing the package in your project, the TensorShare class can be imported from the
tensorshare
module.
from tensorshare import TensorShare
ts = TensorShare(
tensors=..., # Base64 encoded tensors to byte strings ready to be sent
size=..., # Size of the tensors in pydantic.ByteSize format
)
Serializing tensors - from_dict
Because it's tedious to serialize tensors manually, the package provides a TensorShare.from_dict
method to create
a new object from a dictionary of tensors in any supported backend.
from tensorshare import TensorShare
tensors = {
"embeddings": ..., # Tensor
"labels": ..., # Tensor
}
ts = TensorShare.from_dict(tensors)
with a specific backend
You can specify the backend to use by passing the backend
argument to the from_dict
method.
Tip
The backend can be specified as a string or as a Backend
Enum value. Check the Backends section
for more information.
import torch
from tensorshare import TensorShare
tensors = {
"embeddings": torch.zeros((2, 2)),
"labels": torch.zeros((2, 2)),
}
ts = TensorShare.from_dict(tensors, backend="torch")
print(ts)
>>> tensors=b'gAAAAAAAAAB7ImVt...' size=168
If you don't specify the backend, the package will try to infer it from the first tensor in the dictionary, which isn't always the best optimization. As a general rule, it's better to specify the backend explicitly.
Warning
It's not possible (at the moment) to mix tensors from different backends in the same dictionary.
The from_dict
method will raise an exception if you try to do so.
backend-specific examples
Here are some examples of creating a TensorShare
object from a dictionary of tensors in different backends.
Deserializing tensors
Like the from_dict
method, the to_tensors
method can be used to deserialize the serialized tensors
stored in the TensorShare
object. The method expects a backend
argument to specify the backend to use.
ts = TensorShare(
tensors=..., # Base64 encoded tensors to byte strings ready to be sent
size=..., # Size of the tensors in pydantic.ByteSize format
)
tensors = ts.to_tensors(backend=...)
Tip
Again, the backend can be specified as a string or a Backend
Enum value.
Check the Backends section for more information.
Here are some examples of deserializing the tensors from a TensorShare
object in different backends.
You must have the desired backend installed in your project to deserialize the tensors in it.
from tensorshare import TensorShare
ts = TensorShare(
tensors=..., # Base64 encoded tensors to byte strings ready to be sent
size=..., # Size of the tensors in pydantic.ByteSize format
)
# Get a dict of jaxlib.xla_extension.ArrayImpl
tensors_flax = ts.to_tensors(backend="flax") # or backend=Backend.FLAX
from tensorshare import TensorShare
ts = TensorShare(
tensors=..., # Base64 encoded tensors to byte strings ready to be sent
size=..., # Size of the tensors in pydantic.ByteSize format
)
# Get a dict of paddle.Tensor
tensors_paddle = ts.to_tensors(backend="paddlepaddle") # or backend=Backend.PADDLEPADDLE
from tensorshare import TensorShare
ts = TensorShare(
tensors=..., # Base64 encoded tensors to byte strings ready to be sent
size=..., # Size of the tensors in pydantic.ByteSize format
)
# Get a dict of tensorflow.Tensor
tensors_tensorflow = ts.to_tensors(backend="tensorflow") # or backend=Backend.TENSORFLOW
Lazy tensors formatting
If you don't want to handle the formatting of the tensors yourself, we provide
an utils function to prepare tensors to be used in the TensorShare
class.
from tensorshare import prepare_tensors_to_dict
tensors_in_any_format: Any = ...
tensors = prepare_tensors_to_dict(tensors_in_any_format)
>>> {"embeddings_0": ..., "embeddings_1": ..., ...}
Check the utils documentation for more information.
Created: 2023-08-20