Library/Computer and Information Sciences/Python for Data Analysis: Data Wrangling with pandas, NumPy & Jupyter/Data Loading, Storage, and File Formats

Data Loading, Storage, and File Formats

50 questions available

Take a quiz Listen to a podcast

Summary unavailable.

Questions

Question 1

When reading a CSV file that does not contain a header row using pandas, which argument should be passed to `read_csv` to prevent the first data row from being incorrectly used as column names?

View answer and explanation

Question 2

To read the file 'examples/ex2.csv' and assign the column names 'a', 'b', 'c', 'd', and 'message', which is the correct syntax for the `pandas.read_csv` function?

View answer and explanation

Question 3

When reading 'examples/ex2.csv', how can you specify that the 'message' column should become the index of the resulting DataFrame?

View answer and explanation

Question 4

The file 'examples/ex3.txt' has fields separated by a variable amount of whitespace. What value should be passed to the `sep` argument in `pandas.read_csv` to correctly parse this file?

View answer and explanation

Question 5

The file 'examples/ex4.csv' contains data but also has commented lines that should be ignored on the first, third, and fourth rows of the file. How can you skip these specific rows using `pandas.read_csv`?

View answer and explanation

Question 6

What is the function of the `na_values` option when used in the `pandas.read_csv` function?

View answer and explanation

Question 7

In `pandas.read_csv`, what is the effect of setting the `keep_default_na` option to `False`?

View answer and explanation

Question 8

When the `chunksize` argument is used with `pandas.read_csv`, what is the data type of the object that is returned?

View answer and explanation

Question 9

What is the primary reason for using the `chunksize` or `iterator` arguments when reading a large file with pandas?

View answer and explanation

Question 10

If the command `pd.read_csv("examples/ex6.csv", nrows=5)` is executed, how many rows of data, not including the header, will be read from the file and included in the resulting DataFrame?

View answer and explanation

Question 11

When writing a pandas DataFrame to a text file, missing values (NaN) appear as empty strings by default. How can you specify that they should be written as the string 'NULL' instead?

View answer and explanation

Question 12

Which combination of arguments for the `to_csv` method will write a DataFrame's data to a file without the row index and without the column headers?

View answer and explanation

Question 13

To convert a JSON string into a Python object (like a dictionary or list), which function from Python's standard `json` library should be used?

View answer and explanation

Question 14

What is the default assumption made by `pandas.read_json` when converting a JSON dataset into a DataFrame?

View answer and explanation

Question 15

To export a pandas DataFrame to a JSON string using the `to_json` method, which value for the `orient` argument produces a JSON array of dictionaries, where each dictionary represents a row?

View answer and explanation

Question 16

By default, what does the `pandas.read_html` function search for in an HTML document, and what data structure does it return?

View answer and explanation

Question 17

In the example using `lxml.objectify` to parse the MTA performance XML file, what type of object is returned by `root.INDICATOR`?

View answer and explanation

Question 18

What is the primary drawback of using the pickle format for long-term data storage, as mentioned in the chapter?

View answer and explanation

Question 19

For reading tabular data from a Microsoft Excel file that may contain multiple sheets, which pandas class can be used to create an object that facilitates parsing data from specific sheets?

View answer and explanation

Question 20

What is the purpose of the `HDFStore` class in pandas?

View answer and explanation

Question 21

The HDFStore object supports two storage schemas: 'fixed' and 'table'. What is the key difference between them as described in the chapter?

View answer and explanation

Question 22

What is the purpose of the `store.put()` method in the context of an HDFStore object?

View answer and explanation

Question 23

When interacting with a web API using the `requests` library, which method should be called on the response object to get the parsed JSON data as a Python object?

View answer and explanation

Question 24

When selecting data from a table using a standard Python SQL driver like `sqlite3`, what does the `cursor.fetchall()` method typically return?

View answer and explanation

Question 25

When creating a pandas DataFrame from the results of a `sqlite3` query, besides the data rows, what other piece of information is required from the cursor object to correctly label the columns?

View answer and explanation

Question 26

What is the primary advantage of using SQLAlchemy in conjunction with pandas for database operations, as described in the chapter?

View answer and explanation

Question 27

In the example `pd.read_csv("examples/csv_mindex.csv", index_col=["key1", "key2"])`, what kind of index is created in the resulting DataFrame?

View answer and explanation

Question 28

According to Table 6-2, which argument in `pandas.read_csv` is used to provide a dictionary mapping column names or numbers to functions that should be applied to the data in those columns during parsing?

View answer and explanation

Question 29

What is a key characteristic of data formats like HDF5, ORC, and Parquet, as contrasted with text formats like CSV?

View answer and explanation

Question 30

In the `csv` module, what does the `quoting` dialect option `csv.QUOTE_MINIMAL` specify?

View answer and explanation

Question 31

In the example of parsing the `fdic_failed_bank_list.html` file, `len(tables)` returns 1. Why does `pandas.read_html` return a list with one element instead of just a single DataFrame?

View answer and explanation

Question 32

What is the function of `pandas.read_pickle`?

View answer and explanation

Question 33

To write a pandas DataFrame to an Excel file, the text describes a process involving an `ExcelWriter` object. Which of the following is NOT part of that process?

View answer and explanation

Question 34

What does the 'HDF' in HDF5 stand for?

View answer and explanation

Question 35

According to the text, what is a major benefit of using the HDF5 format for working with datasets that don't fit into memory?

View answer and explanation

Question 36

After making an HTTP GET request with the `requests` library, what is the 'good practice' recommended by the text to check for HTTP errors?

View answer and explanation

Question 37

Which pandas function is specifically designed to read data from a file in a fixed-width column format (i.e., with no delimiters)?

View answer and explanation

Question 38

What does the code `data.to_csv(sys.stdout, sep='|')` do?

View answer and explanation

Question 39

When parsing 'examples/ex5.csv', a dictionary is passed to `na_values`: `sentinels = {"message": ["foo", "NA"], "something": ["two"]}`. What is the effect of this?

View answer and explanation

Question 40

Which function is described as the pandas counterpart to `lxml.objectify` for reading XML data in a single expression?

View answer and explanation

Question 41

To read a file in chunks of 1000 rows, the code `chunker = pd.read_csv("examples/ex6.csv", chunksize=1000)` is used. How would you then process the 'key' column of each chunk to get a total value count?

View answer and explanation

Question 42

When is it necessary to use Python's built-in `csv` module for manual processing instead of `pandas.read_csv`?

View answer and explanation

Question 43

What is the purpose of the `pandas.read_sql_table` function?

View answer and explanation

Question 44

What is the result of running `pd.merge(df3, df4, left_on="lkey", right_on="rkey")` when the `lkey` column in `df3` contains 'c' but the `rkey` column in `df4` does not?

View answer and explanation

Question 45

In the example `pd.read_csv('examples/ex1.csv')`, how does pandas determine the column headers for the resulting DataFrame `df`?

View answer and explanation

Question 46

What is the primary function of the `sep` argument in `pandas.read_csv` and `delimiter` in `to_csv`?

View answer and explanation

Question 47

When writing a delimited file manually using Python's `csv.writer`, what does the `writerow` method do?

View answer and explanation

Question 48

Which two add-on packages are mentioned as being used internally by pandas to read old-style XLS and newer XLSX Excel files, respectively?

View answer and explanation

Question 49

The pandas function `read_hdf` is described as a shortcut. What is it a shortcut for?

View answer and explanation

Question 50

In the final example of Chapter 6, the code `pd.read_sql("SELECT * FROM test", db)` is used. What is the type of the `db` object?

View answer and explanation

Other chapters

Preliminaries Python Language Basics, IPython, and Jupyter Notebooks Built-In Data Structures, Functions, and Files NumPy Basics: Arrays and Vectorized Computation Getting Started with pandas Data Cleaning and Preparation Data Wrangling: Join, Combine, and Reshape Plotting and Visualization Data Aggregation and Group Operations Time Series Introduction to Modeling Libraries in Python Data Analysis Examples Advanced NumPy More on the IPython System Index