Library/Computer and Information Sciences/Python for Data Analysis: Data Wrangling with pandas, NumPy & Jupyter/Data Aggregation and Group Operations

Data Aggregation and Group Operations

52 questions available

Take a quiz Listen to a podcast

Summary unavailable.

Questions

Question 1

According to the split-apply-combine strategy for group operations, what is the correct sequence of steps?

View answer and explanation

Question 2

Which of the following is NOT described as a valid form for a grouping key in a pandas groupby operation?

View answer and explanation

Question 3

When performing a groupby aggregation like .mean() on a DataFrame, what is a 'nuisance column'?

View answer and explanation

Question 4

What is the key distinction between the .size() and .count() methods on a pandas GroupBy object?

View answer and explanation

Question 5

When you iterate over a GroupBy object created with multiple keys, such as in 'for (k1, k2), group in df.groupby(["key1", "key2"]):', what is the type and content of the first element in the yielded tuple?

View answer and explanation

Question 6

What is the difference in the object type returned by `df.groupby(["key1", "key2"])["data2"]` versus `df.groupby(["key1", "key2"])[["data2"]]`?

View answer and explanation

Question 7

When grouping a DataFrame `people` by its columns using a dictionary mapping, `people.groupby(mapping, axis="columns")`, what happens to columns in the DataFrame that are not present as keys in the mapping dictionary?

View answer and explanation

Question 8

To apply multiple aggregation functions, such as 'mean' and 'std', to a GroupBy Series object named `grouped_pct`, which syntax would produce a DataFrame with columns named after the functions?

View answer and explanation

Question 9

How can you apply multiple aggregation functions to a GroupBy object and provide custom names for the resulting columns?

View answer and explanation

Question 10

What is the primary function of the `as_index=False` parameter in a `groupby` operation?

View answer and explanation

Question 11

Which method is described as the most general-purpose GroupBy method that splits an object, invokes a passed function on each piece, and attempts to concatenate the pieces?

View answer and explanation

Question 12

When using `groupby(...).apply(my_function)`, how can you pass additional arguments to `my_function`, such as `n=1` and `column="total_bill"`?

View answer and explanation

Question 13

What is the purpose of the `group_keys=False` parameter when used with `groupby(...).apply()`?

View answer and explanation

Question 14

What is the primary characteristic of the `transform` method on a GroupBy object compared to the `apply` method?

View answer and explanation

Question 15

Which of the following is NOT a constraint on a function used with the `transform` method?

View answer and explanation

Question 16

What is meant by an 'unwrapped' group operation in pandas?

View answer and explanation

Question 17

What is a pivot table, as described in the context of pandas?

View answer and explanation

Question 18

In the pandas `pivot_table` method, what is the default aggregation function if `aggfunc` is not specified?

View answer and explanation

Question 19

What does the `margins=True` argument do in the `pivot_table` and `crosstab` functions?

View answer and explanation

Question 20

What is a cross-tabulation (crosstab), and which pandas function is used to create one?

View answer and explanation

Question 21

Consider a DataFrame `df` with columns 'key1' and 'key2'. To get the number of occurrences of each unique pair of values from these columns, which GroupBy method is most direct?

View answer and explanation

Question 22

In Chapter 10.1, a DataFrame `df` is created. If you execute `df.groupby('key1', dropna=False).size()`, how are the missing values in the 'key1' column handled?

View answer and explanation

Question 23

If `grouped = df.groupby('key1')`, which code snippet will calculate the difference between the maximum and minimum of 'data1' for each group in 'key1'?

View answer and explanation

Question 23

If `grouped = df.groupby('key1')`, what is the most direct way to calculate the difference between the maximum and minimum of the 'data1' column for each group?

View answer and explanation

Question 24

To apply different aggregations to different columns of a grouped DataFrame `grouped`—for instance, `np.max` on the 'tip' column and 'sum' on the 'size' column—what is the correct syntax for the `.agg()` method?

View answer and explanation

Question 25

In the example from Chapter 10.3 to fill missing state data, `data.groupby(group_key).apply(fill_mean)`, where `fill_mean` is `def fill_mean(group): return group.fillna(group.mean())`, how is the fill value determined for each missing value?

View answer and explanation

Question 26

What is the result of using `groupby` in conjunction with `pd.cut` on a DataFrame column, as shown in the quantile analysis example?

View answer and explanation

Question 27

In the group-wise linear regression example from Chapter 10.3, `by_year.apply(regress, yvar="AAPL", xvars=["SPX"])`, what does the `apply` function do?

View answer and explanation

Question 28

Given a grouped Series `g`, what is the output of `g.transform('mean')`?

View answer and explanation

Question 29

Why are built-in aggregate functions like 'mean' or 'sum' often much faster when used with `transform` compared to a general apply function?

View answer and explanation

Question 30

In the `tips.pivot_table(index=["time", "day"], columns="smoker", values=["tip_pct", "size"])` example, what do the `index`, `columns`, and `values` arguments specify?

View answer and explanation

Question 31

To create a cross-tabulation of `tips["time"]` and `tips["day"]` against `tips["smoker"]`, which is the correct `pd.crosstab` syntax?

View answer and explanation

Question 32

What does invoking the `mean()` method on a `GroupBy` object, such as `df.groupby('key1').mean()`, actually compute?

View answer and explanation

Question 33

Consider a DataFrame with a MultiIndex on its columns, named 'cty' and 'tenor'. How would you group the DataFrame by the 'cty' level of the column index?

View answer and explanation

Question 34

Why are custom aggregation functions passed to `.agg()` generally much slower than the optimized functions listed in Table 10-1 (e.g., 'sum', 'mean')?

View answer and explanation

Question 35

If you apply `.describe()` to a GroupBy object, what is the result?

View answer and explanation

Question 36

What does the `fill_value` argument in the `pivot_table` method accomplish?

View answer and explanation

Question 37

In the weighted average example, the function `get_wavg` is defined as `np.average(group["data"], weights=group["weights"])`. How is this function used to compute the weighted average for each category?

View answer and explanation

Question 38

When grouping a DataFrame with a function, for example `people.groupby(len).sum()`, what is the function `len` applied to?

View answer and explanation

Question 39

In the code `df.groupby(["key1", "key2"])[["data2"]].mean()`, what will the structure of the output be?

View answer and explanation

Question 40

What does the `nsmallest` method, when used on a GroupBy object like `grouped["data1"].nsmallest(2)`, accomplish?

View answer and explanation

Question 41

If you want to apply a list of functions `["count", "mean", "max"]` to two columns `["tip_pct", "total_bill"]` of a GroupBy object `grouped`, what is the structure of the resulting DataFrame?

View answer and explanation

Question 42

To get a group-wise ranking of values in descending order using `transform`, you could use the function `def get_ranks(group): return group.rank(ascending=False)`. What will be the characteristics of the output of `g.transform(get_ranks)`?

View answer and explanation

Question 43

What is the result of `df.groupby('key1')['key1'].count()`?

View answer and explanation

Question 44

If a DataFrame `df` has 7 rows and you execute `df.groupby('key1').mean()`, and the result in `Out[25]` has two rows with key1 values 'a' and 'b', what can you infer about the 'key1' column in the original DataFrame?

View answer and explanation

Question 45

Which of the optimized groupby methods listed in Table 10-1 would you use to compute the cumulative sum of non-NA values within each group?

View answer and explanation

Question 46

In the random sampling example from Chapter 10.3, a deck of cards is grouped by suit using a function `get_suit`. The code `deck.groupby(get_suit).apply(draw, n=2)` is then used. What is the purpose of this operation?

View answer and explanation

Question 47

Consider the code `tips.groupby(["day", "smoker"]).mean()`. To get the same aggregated data but in a 'flat' format where 'day' and 'smoker' are columns instead of the index, which is the most direct modification?

View answer and explanation

Question 47

Consider the code `tips.groupby(["day", "smoker"]).mean()`. If you wanted the same aggregated data but in a 'flat' format where 'day' and 'smoker' are columns, what change would you make?

View answer and explanation

Question 48

In the `pd.crosstab` example using the tips data, `pd.crosstab([tips["time"], tips["day"]], tips["smoker"])`, what do the rows and columns of the resulting table represent?

View answer and explanation

Question 49

If `df.groupby('key2').mean()` is executed on the DataFrame from `In [15]`, why is the 'key1' column absent from the output shown in `Out[26]`?

View answer and explanation

Question 50

What is a key benefit of using the `transform` method with a fast-path function like 'mean' and performing arithmetic on the results (an 'unwrapped' operation), compared to using `.apply` with a complex function?

View answer and explanation

Other chapters

Preliminaries Python Language Basics, IPython, and Jupyter Notebooks Built-In Data Structures, Functions, and Files NumPy Basics: Arrays and Vectorized Computation Getting Started with pandas Data Loading, Storage, and File Formats Data Cleaning and Preparation Data Wrangling: Join, Combine, and Reshape Plotting and Visualization Time Series Introduction to Modeling Libraries in Python Data Analysis Examples Advanced NumPy More on the IPython System Index