Data Aggregation and Group Operations
52 questions available
Questions
According to the split-apply-combine strategy for group operations, what is the correct sequence of steps?
View answer and explanationWhich of the following is NOT described as a valid form for a grouping key in a pandas groupby operation?
View answer and explanationWhen performing a groupby aggregation like .mean() on a DataFrame, what is a 'nuisance column'?
View answer and explanationWhat is the key distinction between the .size() and .count() methods on a pandas GroupBy object?
View answer and explanationWhen you iterate over a GroupBy object created with multiple keys, such as in 'for (k1, k2), group in df.groupby(["key1", "key2"]):', what is the type and content of the first element in the yielded tuple?
View answer and explanationWhat is the difference in the object type returned by `df.groupby(["key1", "key2"])["data2"]` versus `df.groupby(["key1", "key2"])[["data2"]]`?
View answer and explanationWhen grouping a DataFrame `people` by its columns using a dictionary mapping, `people.groupby(mapping, axis="columns")`, what happens to columns in the DataFrame that are not present as keys in the mapping dictionary?
View answer and explanationTo apply multiple aggregation functions, such as 'mean' and 'std', to a GroupBy Series object named `grouped_pct`, which syntax would produce a DataFrame with columns named after the functions?
View answer and explanationHow can you apply multiple aggregation functions to a GroupBy object and provide custom names for the resulting columns?
View answer and explanationWhat is the primary function of the `as_index=False` parameter in a `groupby` operation?
View answer and explanationWhich method is described as the most general-purpose GroupBy method that splits an object, invokes a passed function on each piece, and attempts to concatenate the pieces?
View answer and explanationWhen using `groupby(...).apply(my_function)`, how can you pass additional arguments to `my_function`, such as `n=1` and `column="total_bill"`?
View answer and explanationWhat is the purpose of the `group_keys=False` parameter when used with `groupby(...).apply()`?
View answer and explanationWhat is the primary characteristic of the `transform` method on a GroupBy object compared to the `apply` method?
View answer and explanationWhich of the following is NOT a constraint on a function used with the `transform` method?
View answer and explanationWhat is meant by an 'unwrapped' group operation in pandas?
View answer and explanationWhat is a pivot table, as described in the context of pandas?
View answer and explanationIn the pandas `pivot_table` method, what is the default aggregation function if `aggfunc` is not specified?
View answer and explanationWhat does the `margins=True` argument do in the `pivot_table` and `crosstab` functions?
View answer and explanationWhat is a cross-tabulation (crosstab), and which pandas function is used to create one?
View answer and explanationConsider a DataFrame `df` with columns 'key1' and 'key2'. To get the number of occurrences of each unique pair of values from these columns, which GroupBy method is most direct?
View answer and explanationIn Chapter 10.1, a DataFrame `df` is created. If you execute `df.groupby('key1', dropna=False).size()`, how are the missing values in the 'key1' column handled?
View answer and explanationIf `grouped = df.groupby('key1')`, which code snippet will calculate the difference between the maximum and minimum of 'data1' for each group in 'key1'?
View answer and explanationIf `grouped = df.groupby('key1')`, what is the most direct way to calculate the difference between the maximum and minimum of the 'data1' column for each group?
View answer and explanationTo apply different aggregations to different columns of a grouped DataFrame `grouped`—for instance, `np.max` on the 'tip' column and 'sum' on the 'size' column—what is the correct syntax for the `.agg()` method?
View answer and explanationIn the example from Chapter 10.3 to fill missing state data, `data.groupby(group_key).apply(fill_mean)`, where `fill_mean` is `def fill_mean(group): return group.fillna(group.mean())`, how is the fill value determined for each missing value?
View answer and explanationWhat is the result of using `groupby` in conjunction with `pd.cut` on a DataFrame column, as shown in the quantile analysis example?
View answer and explanationIn the group-wise linear regression example from Chapter 10.3, `by_year.apply(regress, yvar="AAPL", xvars=["SPX"])`, what does the `apply` function do?
View answer and explanationGiven a grouped Series `g`, what is the output of `g.transform('mean')`?
View answer and explanationWhy are built-in aggregate functions like 'mean' or 'sum' often much faster when used with `transform` compared to a general apply function?
View answer and explanationIn the `tips.pivot_table(index=["time", "day"], columns="smoker", values=["tip_pct", "size"])` example, what do the `index`, `columns`, and `values` arguments specify?
View answer and explanationTo create a cross-tabulation of `tips["time"]` and `tips["day"]` against `tips["smoker"]`, which is the correct `pd.crosstab` syntax?
View answer and explanationWhat does invoking the `mean()` method on a `GroupBy` object, such as `df.groupby('key1').mean()`, actually compute?
View answer and explanationConsider a DataFrame with a MultiIndex on its columns, named 'cty' and 'tenor'. How would you group the DataFrame by the 'cty' level of the column index?
View answer and explanationWhy are custom aggregation functions passed to `.agg()` generally much slower than the optimized functions listed in Table 10-1 (e.g., 'sum', 'mean')?
View answer and explanationIf you apply `.describe()` to a GroupBy object, what is the result?
View answer and explanationWhat does the `fill_value` argument in the `pivot_table` method accomplish?
View answer and explanationIn the weighted average example, the function `get_wavg` is defined as `np.average(group["data"], weights=group["weights"])`. How is this function used to compute the weighted average for each category?
View answer and explanationWhen grouping a DataFrame with a function, for example `people.groupby(len).sum()`, what is the function `len` applied to?
View answer and explanationIn the code `df.groupby(["key1", "key2"])[["data2"]].mean()`, what will the structure of the output be?
View answer and explanationWhat does the `nsmallest` method, when used on a GroupBy object like `grouped["data1"].nsmallest(2)`, accomplish?
View answer and explanationIf you want to apply a list of functions `["count", "mean", "max"]` to two columns `["tip_pct", "total_bill"]` of a GroupBy object `grouped`, what is the structure of the resulting DataFrame?
View answer and explanationTo get a group-wise ranking of values in descending order using `transform`, you could use the function `def get_ranks(group): return group.rank(ascending=False)`. What will be the characteristics of the output of `g.transform(get_ranks)`?
View answer and explanationWhat is the result of `df.groupby('key1')['key1'].count()`?
View answer and explanationIf a DataFrame `df` has 7 rows and you execute `df.groupby('key1').mean()`, and the result in `Out[25]` has two rows with key1 values 'a' and 'b', what can you infer about the 'key1' column in the original DataFrame?
View answer and explanationWhich of the optimized groupby methods listed in Table 10-1 would you use to compute the cumulative sum of non-NA values within each group?
View answer and explanationIn the random sampling example from Chapter 10.3, a deck of cards is grouped by suit using a function `get_suit`. The code `deck.groupby(get_suit).apply(draw, n=2)` is then used. What is the purpose of this operation?
View answer and explanationConsider the code `tips.groupby(["day", "smoker"]).mean()`. To get the same aggregated data but in a 'flat' format where 'day' and 'smoker' are columns instead of the index, which is the most direct modification?
View answer and explanationConsider the code `tips.groupby(["day", "smoker"]).mean()`. If you wanted the same aggregated data but in a 'flat' format where 'day' and 'smoker' are columns, what change would you make?
View answer and explanationIn the `pd.crosstab` example using the tips data, `pd.crosstab([tips["time"], tips["day"]], tips["smoker"])`, what do the rows and columns of the resulting table represent?
View answer and explanationIf `df.groupby('key2').mean()` is executed on the DataFrame from `In [15]`, why is the 'key1' column absent from the output shown in `Out[26]`?
View answer and explanationWhat is a key benefit of using the `transform` method with a fast-path function like 'mean' and performing arithmetic on the results (an 'unwrapped' operation), compared to using `.apply` with a complex function?
View answer and explanation