Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/flatten serialization #2056

Merged
merged 32 commits into from
May 20, 2024
Merged

Feature/flatten serialization #2056

merged 32 commits into from
May 20, 2024

Conversation

jieguangzhou
Copy link
Collaborator

Description

Related Issues

Checklist

  • Is this code covered by new or existing unit tests or integration tests?
  • Did you run make unit-testing and make integration-testing successfully?
  • Do new classes, functions, methods and parameters all have docstrings?
  • Were existing docstrings updated, if necessary?
  • Was external documentation updated, if necessary?

Additional Notes or Comments

@jieguangzhou jieguangzhou mentioned this pull request May 15, 2024
5 tasks
@jieguangzhou
Copy link
Collaborator Author

          **Solution for inline-encoding**

If a Document has a Schema (e.g. in ibis all Document instances have a Schema) then:

  • bytes are encoded directly (don't need a _BaseEncodable in "_leaves".
  • if Encodable: then the bytes are saved directly in the field (as raw or Base64)
  • if Artifact: then the bytes are referred to with &<file_id>
  • if File: then the file is referred to with file://

Originally posted by @blythed in #2049 (comment)

@jieguangzhou jieguangzhou force-pushed the feature/flatten_serialization branch 3 times, most recently from 0799ec1 to bafde5b Compare May 16, 2024 16:23
@jieguangzhou jieguangzhou marked this pull request as ready for review May 17, 2024 07:48
@jieguangzhou jieguangzhou force-pushed the feature/flatten_serialization branch 4 times, most recently from dd9bf63 to 95cec4e Compare May 17, 2024 08:16
@jieguangzhou jieguangzhou linked an issue May 17, 2024 that may be closed by this pull request
return r

def delete_artifact(self, r: t.Dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure we need this method.
Since the artifacts and files are flat in Document.encode, we can
just directly with save_bytes, save_file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal however - can be refactored later.

return current


def parse_query(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: something with a != b, a == b, a >= b, a <= b etc..



# TODO: (New) remove the unused code
def _from_dict(r: t.Any, db: None = None) -> t.Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no longer needed.


def __repr__(self):
return '<EMPTY>'


class IntermidiaType:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intermediate type?

symbol_table = self.db[identifier]
symbol_table = symbol_table.relabel(
# TODO: Check for folds
{'output': identifier, '_fold': f'fold.{identifier}'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Copy link
Collaborator

@blythed blythed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good, just a few things I would like to understand with @jieguangzhou and @kartik4949.

@kartik4949 kartik4949 force-pushed the feature/flatten_serialization branch from bb690fe to 5e54403 Compare May 20, 2024 09:53
@blythed blythed force-pushed the feature/flatten_serialization branch from 5e54403 to bb690fe Compare May 20, 2024 09:55
@blythed blythed force-pushed the feature/flatten_serialization branch 7 times, most recently from 145c602 to 3bcb755 Compare May 20, 2024 13:27
@blythed blythed force-pushed the feature/flatten_serialization branch from 3bcb755 to ad0f535 Compare May 20, 2024 14:34
@blythed blythed self-requested a review May 20, 2024 17:37
@blythed blythed merged commit dbd5b71 into main May 20, 2024
3 checks passed
@blythed blythed deleted the feature/flatten_serialization branch May 20, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SERIALIZE] Make Schema an option for MongoDB
3 participants