-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add a C++ native StringFormat #1289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, the current state-of-the-art in printf replacement is fmt (proposed for standardization). It's overkill for our purposes, just mentioning it in case you haven't run into it.
| } | ||
|
|
||
| TEST(StringFormatTest, Pointer) { | ||
| // pointers implicitly convert to bool. Make sure this doesn't happen in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: also test with nullptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| }; | ||
|
|
||
| Foo foo; | ||
| EXPECT_EQ("Hello Foo", StringFormat("Hello %s", foo.ToString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional nit: why not just test passing a std::string, without using a class with a ToString converter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally intended to make it such that the implementation would call ToString() for you, but couldn't make that work in the time I allowed myself to try to implement it. I've added explicit CString and String tests and removed this test.
| TEST(StringFormatTest, Missing) { | ||
| EXPECT_EQ("Hello <missing>", StringFormat("Hello %s")); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: some additional tests to consider:
- literal
%; - format specifier in the beginning of the string (
StringFormat("%s world", "Hello")); - format string that doesn't end with a format specifier (
StringFormat("Hello%sworld, " ")); - passing empty string as an argument;
- more than one format specifier with nothing else in-between (
StringFormat("Hello%s%s", " ", "world"),StringFormat("%%%s%%), "cmd_style_variable); - more arguments than format specifiers (
StringFormat("Hello %s", 1, 2, 3)); absl::string_viewas an argument (assuming it works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
literal %;
Done
format specifier in the beginning of the string (StringFormat("%s world", "Hello"));
Done
format string that doesn't end with a format specifier (StringFormat("Hello%sworld, " "));
Done
passing empty string as an argument;
This was already in the Empty case.
more than one format specifier with nothing else in-between (StringFormat("Hello%s%s", " ", "world"), StringFormat("%%%s%%), "cmd_style_variable);
Done--though I'm not sure if "cmd_style_variable was other than a typo. If there's some secret ninja feature of C++ I'm not testing here, please let me know (and I wouldn't be surprised--this seems to happen all the time).
more arguments than format specifiers (StringFormat("Hello %s", 1, 2, 3));
Done
absl::string_view as an argument (assuming it works).
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done--though I'm not sure if "cmd_style_variable was other than a typo.
Sorry, unfortunately that was a typo. I seem to vaguely remember that in Windows .cmd files, variables are referenced using %variable% syntax, which is what was referring to.
This was already in the Empty case.
Sorry, overlooked that the format string wasn't empty in that case.
| */ | ||
| template <typename... FA> | ||
| std::string StringFormat(const char* format, const FA&... args) { | ||
| return internal::StringFormatPieces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: assert that format is not null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm loathe to assert anything in here since this is intended to be part of the implementation of assert. Additionally, that would creates a circularity that maybe I could resolve by removing the firebase_firestore_util_base target. This seems like more trouble than it's worth. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't consider the circularity. One option is to silently return, but it could mask bugs. I suppose the best option is to leave as is.
| * | ||
| * The following format specifiers are recognized: | ||
| * * "%%" - A literal "%" | ||
| * * "%s" - The next parameter is copied through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: consider specifying failure strategy:
- what happens if the number of format specifiers and arguments don't match?
- what happens if an invalid specifier is passed?
- which types are supported as
args? (presumably primitives, C-style strings, andstd::string)
I understand the details might change, so perhaps just a general statement on whether bad input is tolerated or not (i.e., will it produce some result or crash).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this implementation tolerant of bad input mostly because we use it in generating error messages and didn't want to create a double-fault situation. I hesitate to make this extensible and complicate the interface.
For now I've just added comments about what will happen if you mess up.
One question that's open to debate is what to do if there's an argument for an invalid specifier. For now I treat the invalid specifier as garbage and don't advance the current piece but I could see this going the other way. If you prefer I could discard the current argument or insert the current piece instead. LMK if you have any preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think leaving as is is fine. As long as the function doesn't blow up, either of those behaviors seem reasonable -- it's a debug-only function, after all.
| * formatting of the value as an unsigned integer. | ||
| * * Otherwise the value is interpreted as anything absl::AlphaNum accepts. | ||
| */ | ||
| class FormatArg : public absl::AlphaNum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness' sake, it looks like AlphaNum is unsupported for use outside Abseil. I don't object to using it, though, unless there's a trivial workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't, short of just copying the code for those cases here. Since we control the import I'm relatively OK with this too. With your help in testing coverage suggestions I think we'll know if they break us.
I considered briefly attempting to push this upstream but they now have documented behavior that booleans format as "0" or "1" so I'm fairly well convinced that the template magic to detect exactly boolean wouldn't be accepted.
| const char* spec_ptr = percent_ptr + 1; | ||
| if (spec_ptr == format_end) { | ||
| // Incomplete specifier, treat as a literal "%" and be done. | ||
| result.append("%", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you can just push back a char, result.push_back('%');.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| auto pieces_iter = pieces.begin(); | ||
| auto pieces_end = pieces.end(); | ||
|
|
||
| while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional nit: I don't really like for(ever) loops, because it's necessary to read the whole implementation to find out how the loop iterates. For this reason, I'll try to provide an alternative version, but as usual, it is entirely optional, feel free to ignore:
auto end = format + strlen(format);
const auto next_specifier = [&](const char* i) { return std::find(i, end, '%'); };
for (auto from = format, to = next_specifier(from); to != end;
from = to + 2, to = next_specifier(from)) {
result.append(from, to);
bool complete_specifier = append_specifier(to);
if (!complete_specifier)
break;
}
result.append(to, end); // This is now outside the loop
// ...
const auto append_specifier = [&](const char* i) {
if (i == end) return true;
if (i + 1 == end) {
result += '%';
return false;
}
switch (i[1]) {
// ...
}
return true;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you're not a fan, but in cases like these I find the contortions required to fit legibly into a traditional for loop are a cure worse than the disease. I've pulled some of the details out of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the updated version is a lot more readable, thanks for doing this.
| } | ||
|
|
||
| // Examine the specifier: | ||
| const char* spec_ptr = percent_ptr + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: consider extracting the logic related to the specifier into a helper function or a lambda, so that the loop is easier to examine at a glance.
|
|
||
| // percent either points to the next format specifier or the end of the | ||
| // format string. Either is safe to append here: | ||
| result.append(format_iter, percent_ptr - format_iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I think append has an overload that accepts two iterators, which should make append(format_iter, percent_ptr) work without the need to calculate distance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I had convinced myself that overload came in C++14, but you're right. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reasons string interface is so bloated is that, if I remember correctly, it originally used "pointer+size" style, but during standardization they also added many "begin+end iterator" style overloads to harmonize it with STL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of fmt. If you'd prefer I could make this fmt compatible in that the one specifier I would accept would be "{}".
|
|
||
| // percent either points to the next format specifier or the end of the | ||
| // format string. Either is safe to append here: | ||
| result.append(format_iter, percent_ptr - format_iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I had convinced myself that overload came in C++14, but you're right. Done.
| const char* spec_ptr = percent_ptr + 1; | ||
| if (spec_ptr == format_end) { | ||
| // Incomplete specifier, treat as a literal "%" and be done. | ||
| result.append("%", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| * formatting of the value as an unsigned integer. | ||
| * * Otherwise the value is interpreted as anything absl::AlphaNum accepts. | ||
| */ | ||
| class FormatArg : public absl::AlphaNum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't, short of just copying the code for those cases here. Since we control the import I'm relatively OK with this too. With your help in testing coverage suggestions I think we'll know if they break us.
I considered briefly attempting to push this upstream but they now have documented behavior that booleans format as "0" or "1" so I'm fairly well convinced that the template magic to detect exactly boolean wouldn't be accepted.
| * | ||
| * The following format specifiers are recognized: | ||
| * * "%%" - A literal "%" | ||
| * * "%s" - The next parameter is copied through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this implementation tolerant of bad input mostly because we use it in generating error messages and didn't want to create a double-fault situation. I hesitate to make this extensible and complicate the interface.
For now I've just added comments about what will happen if you mess up.
One question that's open to debate is what to do if there's an argument for an invalid specifier. For now I treat the invalid specifier as garbage and don't advance the current piece but I could see this going the other way. If you prefer I could discard the current argument or insert the current piece instead. LMK if you have any preference.
| // pointers implicitly convert to bool. Make sure this doesn't happen in | ||
| // this API. | ||
| int value = 4; | ||
| EXPECT_NE("Hello true", StringFormat("Hello %s", &value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm willing to go out on a limb and say that passing const char* that is not a C string is irresponsible. I'm not really willing to convolute the API to support it, since you can straightforwardly work around this by casting void*. I've added documentation to this effect.
| }; | ||
|
|
||
| Foo foo; | ||
| EXPECT_EQ("Hello Foo", StringFormat("Hello %s", foo.ToString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally intended to make it such that the implementation would call ToString() for you, but couldn't make that work in the time I allowed myself to try to implement it. I've added explicit CString and String tests and removed this test.
| TEST(StringFormatTest, Missing) { | ||
| EXPECT_EQ("Hello <missing>", StringFormat("Hello %s")); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
literal %;
Done
format specifier in the beginning of the string (StringFormat("%s world", "Hello"));
Done
format string that doesn't end with a format specifier (StringFormat("Hello%sworld, " "));
Done
passing empty string as an argument;
This was already in the Empty case.
more than one format specifier with nothing else in-between (StringFormat("Hello%s%s", " ", "world"), StringFormat("%%%s%%), "cmd_style_variable);
Done--though I'm not sure if "cmd_style_variable was other than a typo. If there's some secret ninja feature of C++ I'm not testing here, please let me know (and I wouldn't be surprised--this seems to happen all the time).
more arguments than format specifiers (StringFormat("Hello %s", 1, 2, 3));
Done
absl::string_view as an argument (assuming it works).
Done
| */ | ||
| template <typename... FA> | ||
| std::string StringFormat(const char* format, const FA&... args) { | ||
| return internal::StringFormatPieces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm loathe to assert anything in here since this is intended to be part of the implementation of assert. Additionally, that would creates a circularity that maybe I could resolve by removing the firebase_firestore_util_base target. This seems like more trouble than it's worth. WDYT?
| auto pieces_iter = pieces.begin(); | ||
| auto pieces_end = pieces.end(); | ||
|
|
||
| while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you're not a fan, but in cases like these I find the contortions required to fit legibly into a traditional for loop are a cure worse than the disease. I've pulled some of the details out of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. fmt -- it emulates Python3 style of formatting strings. I think either approach is reasonable, so I'm fine with whichever is your preference. If you have plans to expand StringFormat, then perhaps it would be better to adopt {} style (it also feels more C++11, with all the curly braces, and we could even do {auto} custom extension). However, at this point it looks like a utility function for debugging, so perhaps it's not worth the trouble doing any changes.
| // pointers implicitly convert to bool. Make sure this doesn't happen in | ||
| // this API. | ||
| int value = 4; | ||
| EXPECT_NE("Hello true", StringFormat("Hello %s", &value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for adding docs.
| TEST(StringFormatTest, Missing) { | ||
| EXPECT_EQ("Hello <missing>", StringFormat("Hello %s")); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done--though I'm not sure if "cmd_style_variable was other than a typo.
Sorry, unfortunately that was a typo. I seem to vaguely remember that in Windows .cmd files, variables are referenced using %variable% syntax, which is what was referring to.
This was already in the Empty case.
Sorry, overlooked that the format string wasn't empty in that case.
| * | ||
| * The following format specifiers are recognized: | ||
| * * "%%" - A literal "%" | ||
| * * "%s" - The next parameter is copied through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think leaving as is is fine. As long as the function doesn't blow up, either of those behaviors seem reasonable -- it's a debug-only function, after all.
| auto pieces_iter = pieces.begin(); | ||
| auto pieces_end = pieces.end(); | ||
|
|
||
| while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the updated version is a lot more readable, thanks for doing this.
| */ | ||
| template <typename... FA> | ||
| std::string StringFormat(const char* format, const FA&... args) { | ||
| return internal::StringFormatPieces( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't consider the circularity. One option is to silently return, but it could mask bugs. I suppose the best option is to leave as is.
|
|
||
| // percent either points to the next format specifier or the end of the | ||
| // format string. Either is safe to append here: | ||
| result.append(format_iter, percent_ptr - format_iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reasons string interface is so bloated is that, if I remember correctly, it originally used "pointer+size" style, but during standardization they also added many "begin+end iterator" style overloads to harmonize it with STL.
* Add StringFormat * Use StringFormat
The new StringFormat takes a simple format string but uses C++ varargs to avoid all the
.c_str()values and"%" PRId64format specifiers.This is a preparatory step for implementing
HARD_ASSERTto match the other Firestore ports.This does not use ostream-style formatters in the way that
CHECKandDCHECKdo within google3; rather there's still a vastly simplified format string. The result is something that will port easily to the Android and JS SDKs.All uses of
StringPrintf(except those in the implementation offirebase_assert.h) have been converted to the new form.I attempted to make
FormatArgautomatically invokeToString()if the value declared it, but was stymied by lifetime issues for the resulting string. This can't be done in theFormatArgconstructor, but could be done in a separate helper invoked from theStringFormattemplate function. Theenable_ifmagic to make this work started to trip me up though so I figured I'd review this as is before sinking any more time into it.