-
Notifications
You must be signed in to change notification settings - Fork 61
feat: Add support for Proto and Enum types #1202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a first pass, let me know what you think
| DeadlineExceeded, | ||
| ServiceUnavailable, | ||
| ), | ||
| column_info: dict[str, Any] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use something other than Any here?
It seems like the value is only used if it's a Message/EnumTypeWrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
| and column_info.get(column_name) is not None | ||
| ): | ||
| default_proto_message = column_info.get(column_name) | ||
| if isinstance(default_proto_message, Message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder of all of these nested conditionals could be simplified by using exception handling instead? (i.e, just assume everything exists, and return value.bytes_value if a parsing exception is thrown)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to refactor the code using a try-except block as you described, but this got me into several mypy errors as below:
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:175: error: Item "None" of "Optional[Dict[str, Union[Message, EnumTypeWrapper]]]" has no attribute "get" [union-attr]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:175: error: Argument 1 to "get" of "dict" has incompatible type "Optional[str]"; expected "str" [arg-type]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:176: error: Missing positional argument "enum_type" in call to "EnumTypeWrapper" [call-arg]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:176: error: Cannot instantiate type "Type[None]" [misc]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:177: error: Item "EnumTypeWrapper" of "Union[Message, EnumTypeWrapper, Any]" has no attribute "ParseFromString" [union-attr]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:178: error: Incompatible return value type (got "Union[Message, EnumTypeWrapper, Any]", expected "Union[Message, bytes]") [return-value]
I'm not super familiar with Python language, so I'd appreciate any further advice/suggestion you could provide!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you could resolve that using typing.cast, to tell mypy to assume it's a specific type:
try:
proto_enum = cast(dict[str, EnumTypeWrapper], column_info).get(column_name)
return cast(EnumTypeWrapper, proto_enum).Name(value.int_value)
except AttributeError:
return value.int_value
That said, you might still have to do a None check for column_name, because passing .get(None) doesn't seem to raise an exception
This might be getting into "overly clever" territory though, the original if statements worked fine too. Exception handling was something that came to mind to consider as I was reviewing, but maybe it's not worth it here
| ) -> Message | bytes: | ||
| """ | ||
| Parses a serialized protobuf message into a Message object. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some more context here could be helpful.
It seems like it just falls back to value.bytes_value if parsing fails? Is that an expected state, or does that mean something went wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Yes, that's correct. If column_info is not provided or the provided column_info doesn't contain the required type information, then it is expected that we return value.bytes_value. However, if parsing fails then that does mean something went wrong (i.e., the user provided the wrong type information).
In either case, we fall back to return the bytes. This approach/design mirrors the spanner's implementation in googleapis/python-spanner#1084.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah I was going to question if it would be better to raise an exception on parse failure, but that could go either way. If spanner does it this way, it's probably good to follow suit
| metadata_type: SqlType.Enum, | ||
| column_name: str | None, | ||
| column_info: dict[str, Any] | None = None, | ||
| ) -> int | Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have to be Any?
I looked up EnumTypeWrapper.Name, and it seems like it's supposed to be a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Done!
|
|
||
|
|
||
| _TYPE_PARSERS: Dict[ | ||
| Type[SqlType.Type], Callable[[PBValue, Any, str | None, dict[str, Any] | None], Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be good to make a type alias for Callable[[PBValue, Any, str | None, dict[str, Any] | None], Any]. Maybe call it ParserCallable or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of TypeAlias seems to break several checks :(
Could you please let me know what I've done wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I don't think all versions we support work with TypeAlias
I think you can just make it implicit by removing the TypeAlias annotation though:
ParserCallable = Callable[
[PBValue, Any, Optional[str], Optional[Dict[str, Union[Message, EnumTypeWrapper]]]],
Any,
]
That seems to be what we did for other aliases in this library. Let me know if that causes any issues
| from typing import Any, Callable, Dict, Type | ||
|
|
||
| from google.protobuf.message import Message | ||
| from google.protobuf.internal.enum_type_wrapper import EnumTypeWrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this safe to use? (I see internal in there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python-spanner uses the same thing (https://github.com/search?q=repo%3Agoogleapis%2Fpython-spanner+google.protobuf.internal.enum_type_wrapper&type=code) so I'm assuming it should be fine.
| protobuf where deserialization logic is on user-specific code. When provided, | ||
| the custom object enables deserialization of backend-received column data. | ||
| If not provided, data remains serialized as bytes for Proto Messages and | ||
| integer for Proto Enums. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing to me. will users know how to construct this dict? Is there a docs link we can add here or something?
The first and second sentences also seem a bit contradictory, but maybe I'm reading it wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded the docstring. Hopefully it makes more sense now. Please take another look thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! just responded to a couple more comments
| DeadlineExceeded, | ||
| ServiceUnavailable, | ||
| ), | ||
| column_info: dict[str, Union[Message, EnumTypeWrapper]] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: other places in this library, we use | instead of the Union (i.e. dict[str, Message | EnumTypeWrapper]). Doesn't really matter though
| ) -> Message | bytes: | ||
| """ | ||
| Parses a serialized protobuf message into a Message object. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah I was going to question if it would be better to raise an exception on parse failure, but that could go either way. If spanner does it this way, it's probably good to follow suit
| metadata_type: SqlType.Array, | ||
| column_name: str | None, | ||
| column_info: dict[str, Union[Message, EnumTypeWrapper]] | None = None, | ||
| ) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't touch this part, but maybe this should be list[Any]?
| metadata_type: SqlType.Map, | ||
| column_name: str | None, | ||
| column_info: dict[str, Union[Message, EnumTypeWrapper]] | None = None, | ||
| ) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above: dict[Any, Any]?
| protobuf where deserialization logic is on user-specific code. When provided, | ||
| the custom object enables deserialization of backend-received column data. | ||
| If not provided, data remains serialized as bytes for Proto Messages and | ||
| integer for Proto Enums. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me!
|
|
||
|
|
||
| _TYPE_PARSERS: Dict[ | ||
| Type[SqlType.Type], Callable[[PBValue, Any, str | None, dict[str, Any] | None], Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I don't think all versions we support work with TypeAlias
I think you can just make it implicit by removing the TypeAlias annotation though:
ParserCallable = Callable[
[PBValue, Any, Optional[str], Optional[Dict[str, Union[Message, EnumTypeWrapper]]]],
Any,
]
That seems to be what we did for other aliases in this library. Let me know if that causes any issues
| and column_info.get(column_name) is not None | ||
| ): | ||
| default_proto_message = column_info.get(column_name) | ||
| if isinstance(default_proto_message, Message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you could resolve that using typing.cast, to tell mypy to assume it's a specific type:
try:
proto_enum = cast(dict[str, EnumTypeWrapper], column_info).get(column_name)
return cast(EnumTypeWrapper, proto_enum).Name(value.int_value)
except AttributeError:
return value.int_value
That said, you might still have to do a None check for column_name, because passing .get(None) doesn't seem to raise an exception
This might be getting into "overly clever" territory though, the original if statements worked fine too. Exception handling was something that came to mind to consider as I was reviewing, but maybe it's not worth it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| path from the top-level column name. | ||
| - For STRUCTs: ``struct_column_name.field_name`` | ||
| - For MAPs: ``map_column_name.key`` or ``map_column_name.value`` to specify types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the with_history maps that have nested struct values I assume this doesn't work? Or does it handle arbitrary nesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool it looks like this works. Can you add an example of that to the doc as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕