Skip to content

Conversation

@trollyxia
Copy link
Member

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@trollyxia trollyxia requested review from a team as code owners August 28, 2025 20:22
@product-auto-label product-auto-label bot added the size: xl Pull request size is extra large. label Aug 28, 2025
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/python-bigtable API. label Aug 28, 2025
Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first pass, let me know what you think

DeadlineExceeded,
ServiceUnavailable,
),
column_info: dict[str, Any] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use something other than Any here?

It seems like the value is only used if it's a Message/EnumTypeWrapper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

and column_info.get(column_name) is not None
):
default_proto_message = column_info.get(column_name)
if isinstance(default_proto_message, Message):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder of all of these nested conditionals could be simplified by using exception handling instead? (i.e, just assume everything exists, and return value.bytes_value if a parsing exception is thrown)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to refactor the code using a try-except block as you described, but this got me into several mypy errors as below:

google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:175: error: Item "None" of "Optional[Dict[str, Union[Message, EnumTypeWrapper]]]" has no attribute "get"  [union-attr]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:175: error: Argument 1 to "get" of "dict" has incompatible type "Optional[str]"; expected "str"  [arg-type]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:176: error: Missing positional argument "enum_type" in call to "EnumTypeWrapper"  [call-arg]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:176: error: Cannot instantiate type "Type[None]"  [misc]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:177: error: Item "EnumTypeWrapper" of "Union[Message, EnumTypeWrapper, Any]" has no attribute "ParseFromString"  [union-attr]
google/cloud/bigtable/data/execute_query/_query_result_parsing_utils.py:178: error: Incompatible return value type (got "Union[Message, EnumTypeWrapper, Any]", expected "Union[Message, bytes]")  [return-value]

I'm not super familiar with Python language, so I'd appreciate any further advice/suggestion you could provide!

Copy link
Contributor

@daniel-sanche daniel-sanche Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you could resolve that using typing.cast, to tell mypy to assume it's a specific type:

try:
    proto_enum = cast(dict[str, EnumTypeWrapper], column_info).get(column_name)
    return cast(EnumTypeWrapper, proto_enum).Name(value.int_value)
except AttributeError:
    return value.int_value

That said, you might still have to do a None check for column_name, because passing .get(None) doesn't seem to raise an exception

This might be getting into "overly clever" territory though, the original if statements worked fine too. Exception handling was something that came to mind to consider as I was reviewing, but maybe it's not worth it here

) -> Message | bytes:
"""
Parses a serialized protobuf message into a Message object.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some more context here could be helpful.

It seems like it just falls back to value.bytes_value if parsing fails? Is that an expected state, or does that mean something went wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Yes, that's correct. If column_info is not provided or the provided column_info doesn't contain the required type information, then it is expected that we return value.bytes_value. However, if parsing fails then that does mean something went wrong (i.e., the user provided the wrong type information).

In either case, we fall back to return the bytes. This approach/design mirrors the spanner's implementation in googleapis/python-spanner#1084.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah I was going to question if it would be better to raise an exception on parse failure, but that could go either way. If spanner does it this way, it's probably good to follow suit

metadata_type: SqlType.Enum,
column_name: str | None,
column_info: dict[str, Any] | None = None,
) -> int | Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have to be Any?

I looked up EnumTypeWrapper.Name, and it seems like it's supposed to be a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done!



_TYPE_PARSERS: Dict[
Type[SqlType.Type], Callable[[PBValue, Any, str | None, dict[str, Any] | None], Any]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be good to make a type alias for Callable[[PBValue, Any, str | None, dict[str, Any] | None], Any]. Maybe call it ParserCallable or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of TypeAlias seems to break several checks :(

Could you please let me know what I've done wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I don't think all versions we support work with TypeAlias

I think you can just make it implicit by removing the TypeAlias annotation though:

ParserCallable = Callable[
    [PBValue, Any, Optional[str], Optional[Dict[str, Union[Message, EnumTypeWrapper]]]],
    Any,
]

That seems to be what we did for other aliases in this library. Let me know if that causes any issues

from typing import Any, Callable, Dict, Type

from google.protobuf.message import Message
from google.protobuf.internal.enum_type_wrapper import EnumTypeWrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe to use? (I see internal in there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protobuf where deserialization logic is on user-specific code. When provided,
the custom object enables deserialization of backend-received column data.
If not provided, data remains serialized as bytes for Proto Messages and
integer for Proto Enums.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to me. will users know how to construct this dict? Is there a docs link we can add here or something?

The first and second sentences also seem a bit contradictory, but maybe I'm reading it wrong

Copy link
Member Author

@trollyxia trollyxia Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded the docstring. Hopefully it makes more sense now. Please take another look thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me!

Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! just responded to a couple more comments

DeadlineExceeded,
ServiceUnavailable,
),
column_info: dict[str, Union[Message, EnumTypeWrapper]] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: other places in this library, we use | instead of the Union (i.e. dict[str, Message | EnumTypeWrapper]). Doesn't really matter though

) -> Message | bytes:
"""
Parses a serialized protobuf message into a Message object.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah I was going to question if it would be better to raise an exception on parse failure, but that could go either way. If spanner does it this way, it's probably good to follow suit

metadata_type: SqlType.Array,
column_name: str | None,
column_info: dict[str, Union[Message, EnumTypeWrapper]] | None = None,
) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't touch this part, but maybe this should be list[Any]?

metadata_type: SqlType.Map,
column_name: str | None,
column_info: dict[str, Union[Message, EnumTypeWrapper]] | None = None,
) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above: dict[Any, Any]?

protobuf where deserialization logic is on user-specific code. When provided,
the custom object enables deserialization of backend-received column data.
If not provided, data remains serialized as bytes for Proto Messages and
integer for Proto Enums.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me!



_TYPE_PARSERS: Dict[
Type[SqlType.Type], Callable[[PBValue, Any, str | None, dict[str, Any] | None], Any]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I don't think all versions we support work with TypeAlias

I think you can just make it implicit by removing the TypeAlias annotation though:

ParserCallable = Callable[
    [PBValue, Any, Optional[str], Optional[Dict[str, Union[Message, EnumTypeWrapper]]]],
    Any,
]

That seems to be what we did for other aliases in this library. Let me know if that causes any issues

and column_info.get(column_name) is not None
):
default_proto_message = column_info.get(column_name)
if isinstance(default_proto_message, Message):
Copy link
Contributor

@daniel-sanche daniel-sanche Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you could resolve that using typing.cast, to tell mypy to assume it's a specific type:

try:
    proto_enum = cast(dict[str, EnumTypeWrapper], column_info).get(column_name)
    return cast(EnumTypeWrapper, proto_enum).Name(value.int_value)
except AttributeError:
    return value.int_value

That said, you might still have to do a None check for column_name, because passing .get(None) doesn't seem to raise an exception

This might be getting into "overly clever" territory though, the original if statements worked fine too. Exception handling was something that came to mind to consider as I was reviewing, but maybe it's not worth it here

daniel-sanche
daniel-sanche previously approved these changes Sep 8, 2025
Copy link
Contributor

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daniel-sanche daniel-sanche added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 8, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 8, 2025
path from the top-level column name.
- For STRUCTs: ``struct_column_name.field_name``
- For MAPs: ``map_column_name.key`` or ``map_column_name.value`` to specify types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the with_history maps that have nested struct values I assume this doesn't work? Or does it handle arbitrary nesting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool it looks like this works. Can you add an example of that to the doc as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@daniel-sanche daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 11, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 11, 2025
@daniel-sanche daniel-sanche added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 11, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 11, 2025
@daniel-sanche daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2025
@daniel-sanche daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2025
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 12, 2025
@daniel-sanche daniel-sanche merged commit 34ceb86 into googleapis:main Sep 12, 2025
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable Issues related to the googleapis/python-bigtable API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants