Library/Computer and Information Sciences/Python for Data Analysis: Data Wrangling with pandas, NumPy & Jupyter/Data Cleaning and Preparation

Question 19 of 50

Take a quiz Listen to a podcast

If a column in a DataFrame contains strings where multiple categories are separated by a delimiter (e.g., 'Animation|Children's|Comedy'), which method is specially designed to create dummy variables from it?

Correct answer: The `.str.get_dummies()` method.

Explanation

This question assesses knowledge of the specialized `.str.get_dummies()` method for handling the common real-world scenario of multi-categorical string data.

Back to chapter overview

Previous Next

Other questions

Question 1

By default, what is the behavior of the `dropna()` method when applied to a pandas DataFrame?

Question 2

What is the effect of passing `how="all"` as an argument to the `data.dropna()` method on a DataFrame?

Question 3

Suppose you want to keep only the rows in a DataFrame that have at least a certain number of non-missing values. Which argument should you use with the `dropna()` method?

Question 4

When using the `fillna()` method on a DataFrame, what is accomplished by passing a dictionary to it?

Question 5

Which method is considered the workhorse function for replacing missing values in a pandas DataFrame or Series?

Question 6

What does the DataFrame method `duplicated()` return?

Question 7

By default, the `duplicated()` and `drop_duplicates()` methods keep the first observed value combination. How can you modify this behavior to keep the last observed combination instead?

Question 8

What is the primary use of the `map` method on a pandas Series in the context of data transformation?

Question 9

Given the pandas Series `data = pd.Series([1., -999., 2., -999., -1000., 3.])`, what is the result of calling `data.replace(-999, np.nan)`?

Question 10

If you want to replace multiple different values with a single substitute value in a pandas Series, how should you use the `replace` method?

Question 11

How can you create a transformed version of a DataFrame with renamed index and column labels without modifying the original DataFrame?

Question 12

What is the primary function of `pandas.cut`?

Question 13

In the string representation of an interval returned by `pandas.cut`, such as `(18, 25]`, what does the square bracket `]` signify?

Question 14

What is the main difference between the `pandas.cut` and `pandas.qcut` functions?

Question 15

To select all rows in a DataFrame `data` that have a value in any of their columns exceeding 3 in absolute value, which line of code is correct?

Question 16

What does the `numpy.random.permutation()` function produce when called with the length of an axis?

Question 17

How can you select a random subset of 3 rows from a DataFrame `df` without replacement?

Question 18

What is the purpose of the `pandas.get_dummies` function?

Question 20

Why did pandas develop an extension type system, departing from its original reliance on NumPy types?

Question 21

When creating a pandas Series of integers with a missing value using an extension type, what data type should be specified to avoid converting the Series to float64?

Question 22

What is the primary difference between Python's built-in `find()` and `index()` string methods?

Question 23

In the context of regular expressions in Python, why is it highly recommended to use the `re.compile()` function?

Question 24

What is the difference between the `re.search()` and `re.match()` methods?

Question 25

In pandas, how do you access array-oriented methods for string operations on a Series that correctly handle missing (NA) values?

Question 26

Given a pandas Series `data` containing email addresses and NA values, what does the method `data.str.findall(pattern, flags=re.IGNORECASE)` return for a row containing an NA value?

Question 27

What is the purpose of the `.str.extract()` method on a pandas Series?

Question 28

In data warehousing, what is the best practice for representing a column with many repeated values, as described in the chapter?

Question 29

When a pandas Series is converted to the 'category' dtype, what two main components does the underlying Categorical object have?

Question 30

If you have an array of integer codes and an array of corresponding category labels from an external source, which constructor should you use to create a `pandas.Categorical` object?

Question 31

How can you make an unordered categorical Series instance ordered in pandas?

Question 32

Why can GroupBy operations be significantly faster when performed on categorical data compared to string data?

Question 33

In a pandas Series `cat_s` with a categorical dtype, how do you access the categorical methods like `set_categories` or `remove_unused_categories`?

Question 34

After filtering a large DataFrame, many of the original categories in a categorical column may no longer be present in the data. Which method can be used to trim these unobserved categories?

Question 35

What is another term for creating dummy variables from categorical data, as mentioned in the section 'Creating dummy variables for modeling'?

Question 36

Consider the Series `s = pd.Series(['a', 'b', 'c', 'd'] * 2, dtype='category')`. What will be the output of `pd.get_dummies(s)`?

Question 37

In a pandas Series created with `pd.Series([1, 2, None], dtype='float64')`, what value is at index 2?

Question 38

Given a DataFrame `df`, what is the result of `df.fillna(method="ffill", limit=2)`?

Question 39

What does the `precision` argument in `pd.cut(data, 4, precision=2)` do?

Question 40

Consider the code `data[data.abs() > 3] = np.sign(data) * 3`. What is its effect on the DataFrame `data`?

Question 41

What is the difference between `data.replace()` and `data.str.replace()` for a pandas Series?

Question 42

In regular expressions, what does the `findall` method return when the pattern contains capturing groups?

Question 43

How can you slice substrings from each element in a pandas Series `data` in a vectorized way?

Question 44

Consider the code `pd.get_dummies(pd.cut(values, bins))`. What is the useful application of this combination of functions?

Question 45

If you have a pandas Series `cat_s2` with 5 defined categories ('a' through 'e') but the data only contains 'a', 'b', 'c', 'd', what will `cat_s2.value_counts()` show for category 'e'?

Question 46

What is the return type of the `.codes` attribute of a pandas Categorical object?

Question 47

Given `ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]` and `bins = [18, 25, 35, 60, 100]`, how many values fall into the `(18, 25]` bin when `pd.cut(ages, bins)` is called?

Question 48

Which pandas method is specifically designed to perform a vectorized set membership check?

Question 49

What does the pandas `value_counts()` method return?

Question 50

How can you get an index array from an array of possibly non-distinct values into another array of distinct values, which is helpful for data alignment?