Data Wrangling: Join, Combine, and Reshape
50 questions available
Questions
What is the primary function of the `unstack` method when applied to a hierarchically indexed Series in pandas?
View answer and explanationWhich method is the inverse operation of `unstack` in pandas, used for pivoting columns into rows?
View answer and explanationWhen using `pandas.merge` without specifying the `on` or `how` arguments, what is the default behavior?
View answer and explanationWhat does the `set_index` function on a DataFrame accomplish?
View answer and explanationWhen performing a many-to-many merge in pandas, how is the resulting number of rows determined for matching keys?
View answer and explanationWhat is the purpose of the `suffixes` argument in the `pandas.merge` function?
View answer and explanationWhat is the primary difference between the DataFrame's `join` method and `pandas.merge`?
View answer and explanationWhat does the `combine_first` method do when used on two pandas Series or DataFrames?
View answer and explanationWhen using `pandas.concat` to combine several Series objects with `axis="columns"`, what do the `keys` provided in the `keys` argument become in the resulting DataFrame?
View answer and explanationWhat is the effect of passing `ignore_index=True` to the `pandas.concat` function?
View answer and explanationWhat is the purpose of the DataFrame `pivot` method?
View answer and explanationWhich method is described as the inverse operation to `pivot` for DataFrames, transforming data from a wide to a long format?
View answer and explanationIn the `pandas.melt` function, what is the purpose of the `id_vars` argument?
View answer and explanationWhen sorting a hierarchically indexed object, what is the significance of the index being lexicographically sorted?
View answer and explanationHow can you aggregate a DataFrame by a specific index level for summary statistics?
View answer and explanationWhat is the result of applying the `stack` method to the DataFrame created by `data.unstack()` in the code snippet `data = pd.Series([0.9, 0.2, 0.6, 0.7], index=[['a', 'a', 'b', 'b'],[1, 2, 1, 2]])`?
View answer and explanationWhen merging two DataFrames, `df1` and `df2`, on a key that results in a many-to-one join, how are the index values of the output DataFrame determined by default?
View answer and explanationWhich `how` argument value in `pandas.merge` will result in a DataFrame containing the union of keys from both input DataFrames?
View answer and explanationIf you perform an outer join on `df1` (with key 'c') and `df2` (with key 'd'), what values will appear in the columns corresponding to the non-matching DataFrame?
View answer and explanationTo merge a DataFrame `lefth` with columns `key1`, `key2` and a DataFrame `righth` with a hierarchical index, how must you specify the join keys?
View answer and explanationIn a DataFrame `frame` with a MultiIndex on the rows with levels named `key1` and `key2`, what does the method `frame.swaplevel("key1", "key2")` do?
View answer and explanationIf you concatenate two DataFrames with overlapping row indexes but different columns using `pd.concat([df1, df2], axis="columns")`, what is the outcome for rows that exist in one DataFrame but not the other?
View answer and explanationWhen using `pandas.pivot` to reshape a DataFrame, if the specified `index` and `columns` arguments result in multiple values for a given cell, what happens?
View answer and explanationWhat is a key difference between `stack` and `melt`?
View answer and explanationConsider a DataFrame `df` with columns A, B, C, D. What is the result of `pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])`?
View answer and explanationBy default, the `unstack` method pivots which level of a MultiIndex?
View answer and explanationIf `unstacking` a level in a DataFrame results in some subgroups not having all the values present in that level, what does pandas introduce into the resulting DataFrame?
View answer and explanationConsider the code: `df1 = pd.DataFrame({'key': ['b', 'b', 'a'], 'data1': [0, 1, 2]})` and `df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': [0, 1, 2]})`. What is the number of rows in the output of `pd.merge(df1, df2, how='left')`?
View answer and explanationWhat is the primary data structure returned by `pd.MultiIndex.from_arrays`?
View answer and explanationIf `frame` has a hierarchical index on its columns with levels named `state` and `color`, how would you select all columns under the `Ohio` state?
View answer and explanationWhat is the key difference in output between `stack()` and `stack(dropna=False)`?
View answer and explanationIf a DataFrame `df` has columns `lkey` and `rkey` used for merging with `pd.merge(df3, df4, left_on="lkey", right_on="rkey")`, what happens to the `rkey` column in the output?
View answer and explanationWhat are the three fundamental data combination operations in pandas mentioned at the beginning of Section 8.2?
View answer and explanationIf `left2` and `right2` are DataFrames with different columns but partially overlapping indexes, what is the result of `left2.join(right2, how="outer")`?
View answer and explanationWhen using `pd.concat`, how can you create a hierarchical index on the concatenation axis to identify the original pieces of data?
View answer and explanationConsider the DataFrame `data` created in In [126]. What is the shape of the output of `result = data.stack()`?
View answer and explanationIf `long_data` is a DataFrame in long format with columns `date`, `item`, and `value`, what does the code `pivoted = long_data.pivot(index="date", columns="item", values="value")` produce?
View answer and explanationWhat is the key difference between the default behavior of `set_index` and `set_index(drop=False)`?
View answer and explanationWhen is it appropriate to use `left_on` and `right_on` arguments in `pandas.merge`?
View answer and explanationIf `left` has `key1`='foo', `key2`='one' with `lval`=1 and `right` has `key1`='foo', `key2`='one' with `rval`=4 and another row with `key1`='foo', `key2`='one' with `rval`=5, what is the number of rows in the output of `pd.merge(left, right, on=["key1", "key2"], how="inner")`?
View answer and explanationWhen merging `left1` on column 'key' and `right1` on its index, what arguments should be passed to `pd.merge`?
View answer and explanationConsider the numpy array `arr = np.arange(12).reshape((3, 4))`. What is the shape of the output of `np.concatenate([arr, arr], axis=1)`?
View answer and explanationIf `s1` is a Series with index ['a', 'b'] and `s4` is a Series with index ['a', 'b', 'f', 'g'], what happens to the 'f' and 'g' labels in the output of `pd.concat([s1, s4], axis="columns", join="inner")`?
View answer and explanationYou have a list of DataFrames `[df1, df2]` where the row index does not contain relevant data. Which combination of arguments to `pd.concat` will combine them vertically and create a new, continuous integer index?
View answer and explanationIf `a` and `b` are two Series with overlapping indexes and some null values, how does `a.combine_first(b)` determine the values in the resulting Series?
View answer and explanationIf `df1` has a value of 1.0 in column 'a' at index 0, and `df2` has a value of 5.0 in column 'a' at index 0, what will be the value in column 'a' at index 0 of the result of `df1.combine_first(df2)`?
View answer and explanationWhen reshaping a DataFrame using `df.unstack(level="state")`, what does the unstacked level become in the resulting DataFrame's structure?
View answer and explanationIf `long_data.pivot()` is called without the `values` argument, and there are multiple potential value columns, what is the structure of the resulting DataFrame?
View answer and explanationWhen is `pandas.melt` particularly useful without specifying any `id_vars`?
View answer and explanationA DataFrame `frame` is created with a hierarchical index with `key1` and `key2`, and columns with `state` and `color`. Given `frame.index.names = ["key1", "key2"]`, how many levels does the row index have?
View answer and explanation