Data Wrangling: Join, Combine, and Reshape

50 questions available

Summary unavailable.

Questions

Question 1

What is the primary function of the `unstack` method when applied to a hierarchically indexed Series in pandas?

View answer and explanation
Question 2

Which method is the inverse operation of `unstack` in pandas, used for pivoting columns into rows?

View answer and explanation
Question 3

When using `pandas.merge` without specifying the `on` or `how` arguments, what is the default behavior?

View answer and explanation
Question 4

What does the `set_index` function on a DataFrame accomplish?

View answer and explanation
Question 5

When performing a many-to-many merge in pandas, how is the resulting number of rows determined for matching keys?

View answer and explanation
Question 6

What is the purpose of the `suffixes` argument in the `pandas.merge` function?

View answer and explanation
Question 7

What is the primary difference between the DataFrame's `join` method and `pandas.merge`?

View answer and explanation
Question 8

What does the `combine_first` method do when used on two pandas Series or DataFrames?

View answer and explanation
Question 9

When using `pandas.concat` to combine several Series objects with `axis="columns"`, what do the `keys` provided in the `keys` argument become in the resulting DataFrame?

View answer and explanation
Question 10

What is the effect of passing `ignore_index=True` to the `pandas.concat` function?

View answer and explanation
Question 11

What is the purpose of the DataFrame `pivot` method?

View answer and explanation
Question 12

Which method is described as the inverse operation to `pivot` for DataFrames, transforming data from a wide to a long format?

View answer and explanation
Question 13

In the `pandas.melt` function, what is the purpose of the `id_vars` argument?

View answer and explanation
Question 14

When sorting a hierarchically indexed object, what is the significance of the index being lexicographically sorted?

View answer and explanation
Question 15

How can you aggregate a DataFrame by a specific index level for summary statistics?

View answer and explanation
Question 16

What is the result of applying the `stack` method to the DataFrame created by `data.unstack()` in the code snippet `data = pd.Series([0.9, 0.2, 0.6, 0.7], index=[['a', 'a', 'b', 'b'],[1, 2, 1, 2]])`?

View answer and explanation
Question 17

When merging two DataFrames, `df1` and `df2`, on a key that results in a many-to-one join, how are the index values of the output DataFrame determined by default?

View answer and explanation
Question 18

Which `how` argument value in `pandas.merge` will result in a DataFrame containing the union of keys from both input DataFrames?

View answer and explanation
Question 19

If you perform an outer join on `df1` (with key 'c') and `df2` (with key 'd'), what values will appear in the columns corresponding to the non-matching DataFrame?

View answer and explanation
Question 20

To merge a DataFrame `lefth` with columns `key1`, `key2` and a DataFrame `righth` with a hierarchical index, how must you specify the join keys?

View answer and explanation
Question 21

In a DataFrame `frame` with a MultiIndex on the rows with levels named `key1` and `key2`, what does the method `frame.swaplevel("key1", "key2")` do?

View answer and explanation
Question 22

If you concatenate two DataFrames with overlapping row indexes but different columns using `pd.concat([df1, df2], axis="columns")`, what is the outcome for rows that exist in one DataFrame but not the other?

View answer and explanation
Question 23

When using `pandas.pivot` to reshape a DataFrame, if the specified `index` and `columns` arguments result in multiple values for a given cell, what happens?

View answer and explanation
Question 24

What is a key difference between `stack` and `melt`?

View answer and explanation
Question 25

Consider a DataFrame `df` with columns A, B, C, D. What is the result of `pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])`?

View answer and explanation
Question 26

By default, the `unstack` method pivots which level of a MultiIndex?

View answer and explanation
Question 27

If `unstacking` a level in a DataFrame results in some subgroups not having all the values present in that level, what does pandas introduce into the resulting DataFrame?

View answer and explanation
Question 28

Consider the code: `df1 = pd.DataFrame({'key': ['b', 'b', 'a'], 'data1': [0, 1, 2]})` and `df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': [0, 1, 2]})`. What is the number of rows in the output of `pd.merge(df1, df2, how='left')`?

View answer and explanation
Question 29

What is the primary data structure returned by `pd.MultiIndex.from_arrays`?

View answer and explanation
Question 30

If `frame` has a hierarchical index on its columns with levels named `state` and `color`, how would you select all columns under the `Ohio` state?

View answer and explanation
Question 31

What is the key difference in output between `stack()` and `stack(dropna=False)`?

View answer and explanation
Question 32

If a DataFrame `df` has columns `lkey` and `rkey` used for merging with `pd.merge(df3, df4, left_on="lkey", right_on="rkey")`, what happens to the `rkey` column in the output?

View answer and explanation
Question 33

What are the three fundamental data combination operations in pandas mentioned at the beginning of Section 8.2?

View answer and explanation
Question 34

If `left2` and `right2` are DataFrames with different columns but partially overlapping indexes, what is the result of `left2.join(right2, how="outer")`?

View answer and explanation
Question 35

When using `pd.concat`, how can you create a hierarchical index on the concatenation axis to identify the original pieces of data?

View answer and explanation
Question 36

Consider the DataFrame `data` created in In [126]. What is the shape of the output of `result = data.stack()`?

View answer and explanation
Question 37

If `long_data` is a DataFrame in long format with columns `date`, `item`, and `value`, what does the code `pivoted = long_data.pivot(index="date", columns="item", values="value")` produce?

View answer and explanation
Question 38

What is the key difference between the default behavior of `set_index` and `set_index(drop=False)`?

View answer and explanation
Question 39

When is it appropriate to use `left_on` and `right_on` arguments in `pandas.merge`?

View answer and explanation
Question 40

If `left` has `key1`='foo', `key2`='one' with `lval`=1 and `right` has `key1`='foo', `key2`='one' with `rval`=4 and another row with `key1`='foo', `key2`='one' with `rval`=5, what is the number of rows in the output of `pd.merge(left, right, on=["key1", "key2"], how="inner")`?

View answer and explanation
Question 41

When merging `left1` on column 'key' and `right1` on its index, what arguments should be passed to `pd.merge`?

View answer and explanation
Question 42

Consider the numpy array `arr = np.arange(12).reshape((3, 4))`. What is the shape of the output of `np.concatenate([arr, arr], axis=1)`?

View answer and explanation
Question 43

If `s1` is a Series with index ['a', 'b'] and `s4` is a Series with index ['a', 'b', 'f', 'g'], what happens to the 'f' and 'g' labels in the output of `pd.concat([s1, s4], axis="columns", join="inner")`?

View answer and explanation
Question 44

You have a list of DataFrames `[df1, df2]` where the row index does not contain relevant data. Which combination of arguments to `pd.concat` will combine them vertically and create a new, continuous integer index?

View answer and explanation
Question 45

If `a` and `b` are two Series with overlapping indexes and some null values, how does `a.combine_first(b)` determine the values in the resulting Series?

View answer and explanation
Question 46

If `df1` has a value of 1.0 in column 'a' at index 0, and `df2` has a value of 5.0 in column 'a' at index 0, what will be the value in column 'a' at index 0 of the result of `df1.combine_first(df2)`?

View answer and explanation
Question 47

When reshaping a DataFrame using `df.unstack(level="state")`, what does the unstacked level become in the resulting DataFrame's structure?

View answer and explanation
Question 48

If `long_data.pivot()` is called without the `values` argument, and there are multiple potential value columns, what is the structure of the resulting DataFrame?

View answer and explanation
Question 49

When is `pandas.melt` particularly useful without specifying any `id_vars`?

View answer and explanation
Question 50

A DataFrame `frame` is created with a hierarchical index with `key1` and `key2`, and columns with `state` and `color`. Given `frame.index.names = ["key1", "key2"]`, how many levels does the row index have?

View answer and explanation