Library/Computer and Information Sciences/Python for Data Analysis: Data Wrangling with pandas, NumPy & Jupyter/Data Aggregation and Group Operations

Question 11 of 52

Take a quiz Listen to a podcast

Which method is described as the most general-purpose GroupBy method that splits an object, invokes a passed function on each piece, and attempts to concatenate the pieces?

Correct answer: apply

Explanation

This question aims to distinguish between the different high-level GroupBy methods (`apply`, `agg`, `transform`, `filter`) by identifying `apply` as the most flexible and general one.

Back to chapter overview

Previous Next

Other questions

Question 1

According to the split-apply-combine strategy for group operations, what is the correct sequence of steps?

Question 2

Which of the following is NOT described as a valid form for a grouping key in a pandas groupby operation?

Question 3

When performing a groupby aggregation like .mean() on a DataFrame, what is a 'nuisance column'?

Question 4

What is the key distinction between the .size() and .count() methods on a pandas GroupBy object?

Question 5

When you iterate over a GroupBy object created with multiple keys, such as in 'for (k1, k2), group in df.groupby(["key1", "key2"]):', what is the type and content of the first element in the yielded tuple?

Question 6

What is the difference in the object type returned by `df.groupby(["key1", "key2"])["data2"]` versus `df.groupby(["key1", "key2"])[["data2"]]`?

Question 7

When grouping a DataFrame `people` by its columns using a dictionary mapping, `people.groupby(mapping, axis="columns")`, what happens to columns in the DataFrame that are not present as keys in the mapping dictionary?

Question 8

To apply multiple aggregation functions, such as 'mean' and 'std', to a GroupBy Series object named `grouped_pct`, which syntax would produce a DataFrame with columns named after the functions?

Question 9

How can you apply multiple aggregation functions to a GroupBy object and provide custom names for the resulting columns?

Question 10

What is the primary function of the `as_index=False` parameter in a `groupby` operation?

Question 12

When using `groupby(...).apply(my_function)`, how can you pass additional arguments to `my_function`, such as `n=1` and `column="total_bill"`?

Question 13

What is the purpose of the `group_keys=False` parameter when used with `groupby(...).apply()`?

Question 14

What is the primary characteristic of the `transform` method on a GroupBy object compared to the `apply` method?

Question 15

Which of the following is NOT a constraint on a function used with the `transform` method?

Question 16

What is meant by an 'unwrapped' group operation in pandas?

Question 17

What is a pivot table, as described in the context of pandas?

Question 18

In the pandas `pivot_table` method, what is the default aggregation function if `aggfunc` is not specified?

Question 19

What does the `margins=True` argument do in the `pivot_table` and `crosstab` functions?

Question 20

What is a cross-tabulation (crosstab), and which pandas function is used to create one?

Question 21

Consider a DataFrame `df` with columns 'key1' and 'key2'. To get the number of occurrences of each unique pair of values from these columns, which GroupBy method is most direct?

Question 22

In Chapter 10.1, a DataFrame `df` is created. If you execute `df.groupby('key1', dropna=False).size()`, how are the missing values in the 'key1' column handled?

Question 23

If `grouped = df.groupby('key1')`, which code snippet will calculate the difference between the maximum and minimum of 'data1' for each group in 'key1'?

Question 23

If `grouped = df.groupby('key1')`, what is the most direct way to calculate the difference between the maximum and minimum of the 'data1' column for each group?

Question 24

To apply different aggregations to different columns of a grouped DataFrame `grouped`—for instance, `np.max` on the 'tip' column and 'sum' on the 'size' column—what is the correct syntax for the `.agg()` method?

Question 25

In the example from Chapter 10.3 to fill missing state data, `data.groupby(group_key).apply(fill_mean)`, where `fill_mean` is `def fill_mean(group): return group.fillna(group.mean())`, how is the fill value determined for each missing value?

Question 26

What is the result of using `groupby` in conjunction with `pd.cut` on a DataFrame column, as shown in the quantile analysis example?

Question 27

In the group-wise linear regression example from Chapter 10.3, `by_year.apply(regress, yvar="AAPL", xvars=["SPX"])`, what does the `apply` function do?

Question 28

Given a grouped Series `g`, what is the output of `g.transform('mean')`?

Question 29

Why are built-in aggregate functions like 'mean' or 'sum' often much faster when used with `transform` compared to a general apply function?

Question 30

In the `tips.pivot_table(index=["time", "day"], columns="smoker", values=["tip_pct", "size"])` example, what do the `index`, `columns`, and `values` arguments specify?

Question 31

To create a cross-tabulation of `tips["time"]` and `tips["day"]` against `tips["smoker"]`, which is the correct `pd.crosstab` syntax?

Question 32

What does invoking the `mean()` method on a `GroupBy` object, such as `df.groupby('key1').mean()`, actually compute?

Question 33

Consider a DataFrame with a MultiIndex on its columns, named 'cty' and 'tenor'. How would you group the DataFrame by the 'cty' level of the column index?

Question 34

Why are custom aggregation functions passed to `.agg()` generally much slower than the optimized functions listed in Table 10-1 (e.g., 'sum', 'mean')?

Question 35

If you apply `.describe()` to a GroupBy object, what is the result?

Question 36

What does the `fill_value` argument in the `pivot_table` method accomplish?

Question 37

In the weighted average example, the function `get_wavg` is defined as `np.average(group["data"], weights=group["weights"])`. How is this function used to compute the weighted average for each category?

Question 38

When grouping a DataFrame with a function, for example `people.groupby(len).sum()`, what is the function `len` applied to?

Question 39

In the code `df.groupby(["key1", "key2"])[["data2"]].mean()`, what will the structure of the output be?

Question 40

What does the `nsmallest` method, when used on a GroupBy object like `grouped["data1"].nsmallest(2)`, accomplish?

Question 41

If you want to apply a list of functions `["count", "mean", "max"]` to two columns `["tip_pct", "total_bill"]` of a GroupBy object `grouped`, what is the structure of the resulting DataFrame?

Question 42

To get a group-wise ranking of values in descending order using `transform`, you could use the function `def get_ranks(group): return group.rank(ascending=False)`. What will be the characteristics of the output of `g.transform(get_ranks)`?

Question 43

What is the result of `df.groupby('key1')['key1'].count()`?

Question 44

If a DataFrame `df` has 7 rows and you execute `df.groupby('key1').mean()`, and the result in `Out[25]` has two rows with key1 values 'a' and 'b', what can you infer about the 'key1' column in the original DataFrame?

Question 45

Which of the optimized groupby methods listed in Table 10-1 would you use to compute the cumulative sum of non-NA values within each group?

Question 46

In the random sampling example from Chapter 10.3, a deck of cards is grouped by suit using a function `get_suit`. The code `deck.groupby(get_suit).apply(draw, n=2)` is then used. What is the purpose of this operation?

Question 47

Consider the code `tips.groupby(["day", "smoker"]).mean()`. To get the same aggregated data but in a 'flat' format where 'day' and 'smoker' are columns instead of the index, which is the most direct modification?

Question 47

Consider the code `tips.groupby(["day", "smoker"]).mean()`. If you wanted the same aggregated data but in a 'flat' format where 'day' and 'smoker' are columns, what change would you make?

Question 48

In the `pd.crosstab` example using the tips data, `pd.crosstab([tips["time"], tips["day"]], tips["smoker"])`, what do the rows and columns of the resulting table represent?

Question 49

If `df.groupby('key2').mean()` is executed on the DataFrame from `In [15]`, why is the 'key1' column absent from the output shown in `Out[26]`?

Question 50

What is a key benefit of using the `transform` method with a fast-path function like 'mean' and performing arithmetic on the results (an 'unwrapped' operation), compared to using `.apply` with a complex function?