Skip to content

Conversation

@wilhuff
Copy link
Contributor

@wilhuff wilhuff commented May 17, 2018

The new StringFormat takes a simple format string but uses C++ varargs to avoid all the .c_str() values and "%" PRId64 format specifiers.

This is a preparatory step for implementing HARD_ASSERT to match the other Firestore ports.

This does not use ostream-style formatters in the way that CHECK and DCHECK do within google3; rather there's still a vastly simplified format string. The result is something that will port easily to the Android and JS SDKs.

All uses of StringPrintf (except those in the implementation of firebase_assert.h) have been converted to the new form.

I attempted to make FormatArg automatically invoke ToString() if the value declared it, but was stymied by lifetime issues for the resulting string. This can't be done in the FormatArg constructor, but could be done in a separate helper invoked from the StringFormat template function. The enable_if magic to make this work started to trip me up though so I figured I'd review this as is before sinking any more time into it.

Copy link
Contributor

@var-const var-const left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the current state-of-the-art in printf replacement is fmt (proposed for standardization). It's overkill for our purposes, just mentioning it in case you haven't run into it.

}

TEST(StringFormatTest, Pointer) {
// pointers implicitly convert to bool. Make sure this doesn't happen in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: also test with nullptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

};

Foo foo;
EXPECT_EQ("Hello Foo", StringFormat("Hello %s", foo.ToString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: why not just test passing a std::string, without using a class with a ToString converter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally intended to make it such that the implementation would call ToString() for you, but couldn't make that work in the time I allowed myself to try to implement it. I've added explicit CString and String tests and removed this test.

TEST(StringFormatTest, Missing) {
EXPECT_EQ("Hello <missing>", StringFormat("Hello %s"));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: some additional tests to consider:

  • literal %;
  • format specifier in the beginning of the string (StringFormat("%s world", "Hello"));
  • format string that doesn't end with a format specifier (StringFormat("Hello%sworld, " "));
  • passing empty string as an argument;
  • more than one format specifier with nothing else in-between (StringFormat("Hello%s%s", " ", "world"), StringFormat("%%%s%%), "cmd_style_variable);
  • more arguments than format specifiers (StringFormat("Hello %s", 1, 2, 3));
  • absl::string_view as an argument (assuming it works).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

literal %;

Done

format specifier in the beginning of the string (StringFormat("%s world", "Hello"));

Done

format string that doesn't end with a format specifier (StringFormat("Hello%sworld, " "));

Done

passing empty string as an argument;

This was already in the Empty case.

more than one format specifier with nothing else in-between (StringFormat("Hello%s%s", " ", "world"), StringFormat("%%%s%%), "cmd_style_variable);

Done--though I'm not sure if "cmd_style_variable was other than a typo. If there's some secret ninja feature of C++ I'm not testing here, please let me know (and I wouldn't be surprised--this seems to happen all the time).

more arguments than format specifiers (StringFormat("Hello %s", 1, 2, 3));

Done

absl::string_view as an argument (assuming it works).

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done--though I'm not sure if "cmd_style_variable was other than a typo.

Sorry, unfortunately that was a typo. I seem to vaguely remember that in Windows .cmd files, variables are referenced using %variable% syntax, which is what was referring to.

This was already in the Empty case.

Sorry, overlooked that the format string wasn't empty in that case.

*/
template <typename... FA>
std::string StringFormat(const char* format, const FA&... args) {
return internal::StringFormatPieces(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: assert that format is not null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loathe to assert anything in here since this is intended to be part of the implementation of assert. Additionally, that would creates a circularity that maybe I could resolve by removing the firebase_firestore_util_base target. This seems like more trouble than it's worth. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't consider the circularity. One option is to silently return, but it could mask bugs. I suppose the best option is to leave as is.

*
* The following format specifiers are recognized:
* * "%%" - A literal "%"
* * "%s" - The next parameter is copied through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: consider specifying failure strategy:

  • what happens if the number of format specifiers and arguments don't match?
  • what happens if an invalid specifier is passed?
  • which types are supported as args? (presumably primitives, C-style strings, and std::string)

I understand the details might change, so perhaps just a general statement on whether bad input is tolerated or not (i.e., will it produce some result or crash).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this implementation tolerant of bad input mostly because we use it in generating error messages and didn't want to create a double-fault situation. I hesitate to make this extensible and complicate the interface.

For now I've just added comments about what will happen if you mess up.

One question that's open to debate is what to do if there's an argument for an invalid specifier. For now I treat the invalid specifier as garbage and don't advance the current piece but I could see this going the other way. If you prefer I could discard the current argument or insert the current piece instead. LMK if you have any preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leaving as is is fine. As long as the function doesn't blow up, either of those behaviors seem reasonable -- it's a debug-only function, after all.

* formatting of the value as an unsigned integer.
* * Otherwise the value is interpreted as anything absl::AlphaNum accepts.
*/
class FormatArg : public absl::AlphaNum {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness' sake, it looks like AlphaNum is unsupported for use outside Abseil. I don't object to using it, though, unless there's a trivial workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't, short of just copying the code for those cases here. Since we control the import I'm relatively OK with this too. With your help in testing coverage suggestions I think we'll know if they break us.

I considered briefly attempting to push this upstream but they now have documented behavior that booleans format as "0" or "1" so I'm fairly well convinced that the template magic to detect exactly boolean wouldn't be accepted.

const char* spec_ptr = percent_ptr + 1;
if (spec_ptr == format_end) {
// Incomplete specifier, treat as a literal "%" and be done.
result.append("%", 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can just push back a char, result.push_back('%');.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

auto pieces_iter = pieces.begin();
auto pieces_end = pieces.end();

while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional nit: I don't really like for(ever) loops, because it's necessary to read the whole implementation to find out how the loop iterates. For this reason, I'll try to provide an alternative version, but as usual, it is entirely optional, feel free to ignore:

auto end = format + strlen(format);
const auto next_specifier = [&](const char* i) { return std::find(i, end, '%'); };
for (auto from = format, to = next_specifier(from); to != end;
    from = to + 2, to = next_specifier(from)) {
  result.append(from, to);
  bool complete_specifier = append_specifier(to);
  if (!complete_specifier)
    break;
}
result.append(to, end); // This is now outside the loop
// ...
const auto append_specifier = [&](const char* i) {
  if (i == end) return true;
  if (i + 1 == end) {
    result += '%';
    return false;
  }
  switch (i[1]) {
  // ...
  }
  return true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're not a fan, but in cases like these I find the contortions required to fit legibly into a traditional for loop are a cure worse than the disease. I've pulled some of the details out of the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the updated version is a lot more readable, thanks for doing this.

}

// Examine the specifier:
const char* spec_ptr = percent_ptr + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: consider extracting the logic related to the specifier into a helper function or a lambda, so that the loop is easier to examine at a glance.


// percent either points to the next format specifier or the end of the
// format string. Either is safe to append here:
result.append(format_iter, percent_ptr - format_iter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I think append has an overload that accepts two iterators, which should make append(format_iter, percent_ptr) work without the need to calculate distance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I had convinced myself that overload came in C++14, but you're right. Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons string interface is so bloated is that, if I remember correctly, it originally used "pointer+size" style, but during standardization they also added many "begin+end iterator" style overloads to harmonize it with STL.

@var-const var-const assigned wilhuff and unassigned var-const May 17, 2018
Copy link
Contributor Author

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of fmt. If you'd prefer I could make this fmt compatible in that the one specifier I would accept would be "{}".


// percent either points to the next format specifier or the end of the
// format string. Either is safe to append here:
result.append(format_iter, percent_ptr - format_iter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I had convinced myself that overload came in C++14, but you're right. Done.

const char* spec_ptr = percent_ptr + 1;
if (spec_ptr == format_end) {
// Incomplete specifier, treat as a literal "%" and be done.
result.append("%", 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* formatting of the value as an unsigned integer.
* * Otherwise the value is interpreted as anything absl::AlphaNum accepts.
*/
class FormatArg : public absl::AlphaNum {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't, short of just copying the code for those cases here. Since we control the import I'm relatively OK with this too. With your help in testing coverage suggestions I think we'll know if they break us.

I considered briefly attempting to push this upstream but they now have documented behavior that booleans format as "0" or "1" so I'm fairly well convinced that the template magic to detect exactly boolean wouldn't be accepted.

*
* The following format specifiers are recognized:
* * "%%" - A literal "%"
* * "%s" - The next parameter is copied through
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this implementation tolerant of bad input mostly because we use it in generating error messages and didn't want to create a double-fault situation. I hesitate to make this extensible and complicate the interface.

For now I've just added comments about what will happen if you mess up.

One question that's open to debate is what to do if there's an argument for an invalid specifier. For now I treat the invalid specifier as garbage and don't advance the current piece but I could see this going the other way. If you prefer I could discard the current argument or insert the current piece instead. LMK if you have any preference.

// pointers implicitly convert to bool. Make sure this doesn't happen in
// this API.
int value = 4;
EXPECT_NE("Hello true", StringFormat("Hello %s", &value));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to go out on a limb and say that passing const char* that is not a C string is irresponsible. I'm not really willing to convolute the API to support it, since you can straightforwardly work around this by casting void*. I've added documentation to this effect.

};

Foo foo;
EXPECT_EQ("Hello Foo", StringFormat("Hello %s", foo.ToString()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally intended to make it such that the implementation would call ToString() for you, but couldn't make that work in the time I allowed myself to try to implement it. I've added explicit CString and String tests and removed this test.

TEST(StringFormatTest, Missing) {
EXPECT_EQ("Hello <missing>", StringFormat("Hello %s"));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

literal %;

Done

format specifier in the beginning of the string (StringFormat("%s world", "Hello"));

Done

format string that doesn't end with a format specifier (StringFormat("Hello%sworld, " "));

Done

passing empty string as an argument;

This was already in the Empty case.

more than one format specifier with nothing else in-between (StringFormat("Hello%s%s", " ", "world"), StringFormat("%%%s%%), "cmd_style_variable);

Done--though I'm not sure if "cmd_style_variable was other than a typo. If there's some secret ninja feature of C++ I'm not testing here, please let me know (and I wouldn't be surprised--this seems to happen all the time).

more arguments than format specifiers (StringFormat("Hello %s", 1, 2, 3));

Done

absl::string_view as an argument (assuming it works).

Done

*/
template <typename... FA>
std::string StringFormat(const char* format, const FA&... args) {
return internal::StringFormatPieces(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loathe to assert anything in here since this is intended to be part of the implementation of assert. Additionally, that would creates a circularity that maybe I could resolve by removing the firebase_firestore_util_base target. This seems like more trouble than it's worth. WDYT?

auto pieces_iter = pieces.begin();
auto pieces_end = pieces.end();

while (true) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're not a fan, but in cases like these I find the contortions required to fit legibly into a traditional for loop are a cure worse than the disease. I've pulled some of the details out of the loop.

Copy link
Contributor

@var-const var-const left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re. fmt -- it emulates Python3 style of formatting strings. I think either approach is reasonable, so I'm fine with whichever is your preference. If you have plans to expand StringFormat, then perhaps it would be better to adopt {} style (it also feels more C++11, with all the curly braces, and we could even do {auto} custom extension). However, at this point it looks like a utility function for debugging, so perhaps it's not worth the trouble doing any changes.

// pointers implicitly convert to bool. Make sure this doesn't happen in
// this API.
int value = 4;
EXPECT_NE("Hello true", StringFormat("Hello %s", &value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks for adding docs.

TEST(StringFormatTest, Missing) {
EXPECT_EQ("Hello <missing>", StringFormat("Hello %s"));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done--though I'm not sure if "cmd_style_variable was other than a typo.

Sorry, unfortunately that was a typo. I seem to vaguely remember that in Windows .cmd files, variables are referenced using %variable% syntax, which is what was referring to.

This was already in the Empty case.

Sorry, overlooked that the format string wasn't empty in that case.

*
* The following format specifiers are recognized:
* * "%%" - A literal "%"
* * "%s" - The next parameter is copied through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leaving as is is fine. As long as the function doesn't blow up, either of those behaviors seem reasonable -- it's a debug-only function, after all.

auto pieces_iter = pieces.begin();
auto pieces_end = pieces.end();

while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the updated version is a lot more readable, thanks for doing this.

*/
template <typename... FA>
std::string StringFormat(const char* format, const FA&... args) {
return internal::StringFormatPieces(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't consider the circularity. One option is to silently return, but it could mask bugs. I suppose the best option is to leave as is.


// percent either points to the next format specifier or the end of the
// format string. Either is safe to append here:
result.append(format_iter, percent_ptr - format_iter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons string interface is so bloated is that, if I remember correctly, it originally used "pointer+size" style, but during standardization they also added many "begin+end iterator" style overloads to harmonize it with STL.

@wilhuff wilhuff merged commit 11933c0 into master May 21, 2018
@wilhuff wilhuff deleted the wilhuff/string-format branch May 21, 2018 20:17
minafarid pushed a commit to minafarid/firebase-ios-sdk that referenced this pull request Jun 6, 2018
* Add StringFormat
* Use StringFormat
@firebase firebase locked and limited conversation to collaborators Nov 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants