Data Cleaning and Preparation
50 questions available
Questions
By default, what is the behavior of the `dropna()` method when applied to a pandas DataFrame?
View answer and explanationWhat is the effect of passing `how="all"` as an argument to the `data.dropna()` method on a DataFrame?
View answer and explanationSuppose you want to keep only the rows in a DataFrame that have at least a certain number of non-missing values. Which argument should you use with the `dropna()` method?
View answer and explanationWhen using the `fillna()` method on a DataFrame, what is accomplished by passing a dictionary to it?
View answer and explanationWhich method is considered the workhorse function for replacing missing values in a pandas DataFrame or Series?
View answer and explanationWhat does the DataFrame method `duplicated()` return?
View answer and explanationBy default, the `duplicated()` and `drop_duplicates()` methods keep the first observed value combination. How can you modify this behavior to keep the last observed combination instead?
View answer and explanationWhat is the primary use of the `map` method on a pandas Series in the context of data transformation?
View answer and explanationGiven the pandas Series `data = pd.Series([1., -999., 2., -999., -1000., 3.])`, what is the result of calling `data.replace(-999, np.nan)`?
View answer and explanationIf you want to replace multiple different values with a single substitute value in a pandas Series, how should you use the `replace` method?
View answer and explanationHow can you create a transformed version of a DataFrame with renamed index and column labels without modifying the original DataFrame?
View answer and explanationWhat is the primary function of `pandas.cut`?
View answer and explanationIn the string representation of an interval returned by `pandas.cut`, such as `(18, 25]`, what does the square bracket `]` signify?
View answer and explanationWhat is the main difference between the `pandas.cut` and `pandas.qcut` functions?
View answer and explanationTo select all rows in a DataFrame `data` that have a value in any of their columns exceeding 3 in absolute value, which line of code is correct?
View answer and explanationWhat does the `numpy.random.permutation()` function produce when called with the length of an axis?
View answer and explanationHow can you select a random subset of 3 rows from a DataFrame `df` without replacement?
View answer and explanationWhat is the purpose of the `pandas.get_dummies` function?
View answer and explanationIf a column in a DataFrame contains strings where multiple categories are separated by a delimiter (e.g., 'Animation|Children's|Comedy'), which method is specially designed to create dummy variables from it?
View answer and explanationWhy did pandas develop an extension type system, departing from its original reliance on NumPy types?
View answer and explanationWhen creating a pandas Series of integers with a missing value using an extension type, what data type should be specified to avoid converting the Series to float64?
View answer and explanationWhat is the primary difference between Python's built-in `find()` and `index()` string methods?
View answer and explanationIn the context of regular expressions in Python, why is it highly recommended to use the `re.compile()` function?
View answer and explanationWhat is the difference between the `re.search()` and `re.match()` methods?
View answer and explanationIn pandas, how do you access array-oriented methods for string operations on a Series that correctly handle missing (NA) values?
View answer and explanationGiven a pandas Series `data` containing email addresses and NA values, what does the method `data.str.findall(pattern, flags=re.IGNORECASE)` return for a row containing an NA value?
View answer and explanationWhat is the purpose of the `.str.extract()` method on a pandas Series?
View answer and explanationIn data warehousing, what is the best practice for representing a column with many repeated values, as described in the chapter?
View answer and explanationWhen a pandas Series is converted to the 'category' dtype, what two main components does the underlying Categorical object have?
View answer and explanationIf you have an array of integer codes and an array of corresponding category labels from an external source, which constructor should you use to create a `pandas.Categorical` object?
View answer and explanationHow can you make an unordered categorical Series instance ordered in pandas?
View answer and explanationWhy can GroupBy operations be significantly faster when performed on categorical data compared to string data?
View answer and explanationIn a pandas Series `cat_s` with a categorical dtype, how do you access the categorical methods like `set_categories` or `remove_unused_categories`?
View answer and explanationAfter filtering a large DataFrame, many of the original categories in a categorical column may no longer be present in the data. Which method can be used to trim these unobserved categories?
View answer and explanationWhat is another term for creating dummy variables from categorical data, as mentioned in the section 'Creating dummy variables for modeling'?
View answer and explanationConsider the Series `s = pd.Series(['a', 'b', 'c', 'd'] * 2, dtype='category')`. What will be the output of `pd.get_dummies(s)`?
View answer and explanationIn a pandas Series created with `pd.Series([1, 2, None], dtype='float64')`, what value is at index 2?
View answer and explanationGiven a DataFrame `df`, what is the result of `df.fillna(method="ffill", limit=2)`?
View answer and explanationWhat does the `precision` argument in `pd.cut(data, 4, precision=2)` do?
View answer and explanationConsider the code `data[data.abs() > 3] = np.sign(data) * 3`. What is its effect on the DataFrame `data`?
View answer and explanationWhat is the difference between `data.replace()` and `data.str.replace()` for a pandas Series?
View answer and explanationIn regular expressions, what does the `findall` method return when the pattern contains capturing groups?
View answer and explanationHow can you slice substrings from each element in a pandas Series `data` in a vectorized way?
View answer and explanationConsider the code `pd.get_dummies(pd.cut(values, bins))`. What is the useful application of this combination of functions?
View answer and explanationIf you have a pandas Series `cat_s2` with 5 defined categories ('a' through 'e') but the data only contains 'a', 'b', 'c', 'd', what will `cat_s2.value_counts()` show for category 'e'?
View answer and explanationWhat is the return type of the `.codes` attribute of a pandas Categorical object?
View answer and explanationGiven `ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]` and `bins = [18, 25, 35, 60, 100]`, how many values fall into the `(18, 25]` bin when `pd.cut(ages, bins)` is called?
View answer and explanationWhich pandas method is specifically designed to perform a vectorized set membership check?
View answer and explanationWhat does the pandas `value_counts()` method return?
View answer and explanationHow can you get an index array from an array of possibly non-distinct values into another array of distinct values, which is helpful for data alignment?
View answer and explanation