-
Notifications
You must be signed in to change notification settings - Fork 76
Add tests for the count function
#1530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
acfbdcf to
804822e
Compare
| "age" to columnOf(15, 20, 25), | ||
| "group" to columnOf(1, 1, 2), | ||
| ) | ||
| val age = df["age"].cast<Int>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the best way to do that without using the compiler plugin? I also tried: val age by column<Int>(), and then write df[age] but it is marked as deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is the best for now. We deprecated the column accessor api (so val myCol by column(); df[myCol]) in favor of the compiler plugin, but we still have the String API we can use if the compiler plugin cannot be used, like here :)
You're using the String API correctly. The only alternative I can think of is df.get { "age"<Int>() } but it achieves exactly the same thing.
| fun `count on dataframe`() { | ||
| df.count() shouldBe 3 | ||
| df.count { age > 18 } shouldBe 2 | ||
| df.count { it["name"] == "Alice" } shouldBe 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can I do df.count { age > 18 } but cannot do df.count { name.startsWith("A") }? age and name are obtained in the same way, and we made sure that they have the types that are appropriate for the called methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're calling CountTests.age inside count {}, not actually age from df :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use df.count { column<Int>("age") > 18 } to access a column from df by String name and type within this DSL. Or the shortcut:
df.count { "age"<Int>() > 18 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do the same with "name" and then you can call startsWith :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you!
|
|
||
| @Test | ||
| fun `count on empty grouped dataframe`() { | ||
| emptyDf.groupBy("group").count().count() shouldBe 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do emptyDf.groupBy("group").count(), it returns a dataframe without the column count. Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! We should at least get an empty column named count. This causes runtime exceptions with the compiler plugin, but I think it's an issue deep inside aggregation itself... I'll make an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! #1531
Fixes #1496