Data Loading, Storage, and File Formats
50 questions available
Questions
When reading a CSV file that does not contain a header row using pandas, which argument should be passed to `read_csv` to prevent the first data row from being incorrectly used as column names?
View answer and explanationTo read the file 'examples/ex2.csv' and assign the column names 'a', 'b', 'c', 'd', and 'message', which is the correct syntax for the `pandas.read_csv` function?
View answer and explanationWhen reading 'examples/ex2.csv', how can you specify that the 'message' column should become the index of the resulting DataFrame?
View answer and explanationThe file 'examples/ex3.txt' has fields separated by a variable amount of whitespace. What value should be passed to the `sep` argument in `pandas.read_csv` to correctly parse this file?
View answer and explanationThe file 'examples/ex4.csv' contains data but also has commented lines that should be ignored on the first, third, and fourth rows of the file. How can you skip these specific rows using `pandas.read_csv`?
View answer and explanationWhat is the function of the `na_values` option when used in the `pandas.read_csv` function?
View answer and explanationIn `pandas.read_csv`, what is the effect of setting the `keep_default_na` option to `False`?
View answer and explanationWhen the `chunksize` argument is used with `pandas.read_csv`, what is the data type of the object that is returned?
View answer and explanationWhat is the primary reason for using the `chunksize` or `iterator` arguments when reading a large file with pandas?
View answer and explanationIf the command `pd.read_csv("examples/ex6.csv", nrows=5)` is executed, how many rows of data, not including the header, will be read from the file and included in the resulting DataFrame?
View answer and explanationWhen writing a pandas DataFrame to a text file, missing values (NaN) appear as empty strings by default. How can you specify that they should be written as the string 'NULL' instead?
View answer and explanationWhich combination of arguments for the `to_csv` method will write a DataFrame's data to a file without the row index and without the column headers?
View answer and explanationTo convert a JSON string into a Python object (like a dictionary or list), which function from Python's standard `json` library should be used?
View answer and explanationWhat is the default assumption made by `pandas.read_json` when converting a JSON dataset into a DataFrame?
View answer and explanationTo export a pandas DataFrame to a JSON string using the `to_json` method, which value for the `orient` argument produces a JSON array of dictionaries, where each dictionary represents a row?
View answer and explanationBy default, what does the `pandas.read_html` function search for in an HTML document, and what data structure does it return?
View answer and explanationIn the example using `lxml.objectify` to parse the MTA performance XML file, what type of object is returned by `root.INDICATOR`?
View answer and explanationWhat is the primary drawback of using the pickle format for long-term data storage, as mentioned in the chapter?
View answer and explanationFor reading tabular data from a Microsoft Excel file that may contain multiple sheets, which pandas class can be used to create an object that facilitates parsing data from specific sheets?
View answer and explanationWhat is the purpose of the `HDFStore` class in pandas?
View answer and explanationThe HDFStore object supports two storage schemas: 'fixed' and 'table'. What is the key difference between them as described in the chapter?
View answer and explanationWhat is the purpose of the `store.put()` method in the context of an HDFStore object?
View answer and explanationWhen interacting with a web API using the `requests` library, which method should be called on the response object to get the parsed JSON data as a Python object?
View answer and explanationWhen selecting data from a table using a standard Python SQL driver like `sqlite3`, what does the `cursor.fetchall()` method typically return?
View answer and explanationWhen creating a pandas DataFrame from the results of a `sqlite3` query, besides the data rows, what other piece of information is required from the cursor object to correctly label the columns?
View answer and explanationWhat is the primary advantage of using SQLAlchemy in conjunction with pandas for database operations, as described in the chapter?
View answer and explanationIn the example `pd.read_csv("examples/csv_mindex.csv", index_col=["key1", "key2"])`, what kind of index is created in the resulting DataFrame?
View answer and explanationAccording to Table 6-2, which argument in `pandas.read_csv` is used to provide a dictionary mapping column names or numbers to functions that should be applied to the data in those columns during parsing?
View answer and explanationWhat is a key characteristic of data formats like HDF5, ORC, and Parquet, as contrasted with text formats like CSV?
View answer and explanationIn the `csv` module, what does the `quoting` dialect option `csv.QUOTE_MINIMAL` specify?
View answer and explanationIn the example of parsing the `fdic_failed_bank_list.html` file, `len(tables)` returns 1. Why does `pandas.read_html` return a list with one element instead of just a single DataFrame?
View answer and explanationWhat is the function of `pandas.read_pickle`?
View answer and explanationTo write a pandas DataFrame to an Excel file, the text describes a process involving an `ExcelWriter` object. Which of the following is NOT part of that process?
View answer and explanationWhat does the 'HDF' in HDF5 stand for?
View answer and explanationAccording to the text, what is a major benefit of using the HDF5 format for working with datasets that don't fit into memory?
View answer and explanationAfter making an HTTP GET request with the `requests` library, what is the 'good practice' recommended by the text to check for HTTP errors?
View answer and explanationWhich pandas function is specifically designed to read data from a file in a fixed-width column format (i.e., with no delimiters)?
View answer and explanationWhat does the code `data.to_csv(sys.stdout, sep='|')` do?
View answer and explanationWhen parsing 'examples/ex5.csv', a dictionary is passed to `na_values`: `sentinels = {"message": ["foo", "NA"], "something": ["two"]}`. What is the effect of this?
View answer and explanationWhich function is described as the pandas counterpart to `lxml.objectify` for reading XML data in a single expression?
View answer and explanationTo read a file in chunks of 1000 rows, the code `chunker = pd.read_csv("examples/ex6.csv", chunksize=1000)` is used. How would you then process the 'key' column of each chunk to get a total value count?
View answer and explanationWhen is it necessary to use Python's built-in `csv` module for manual processing instead of `pandas.read_csv`?
View answer and explanationWhat is the purpose of the `pandas.read_sql_table` function?
View answer and explanationWhat is the result of running `pd.merge(df3, df4, left_on="lkey", right_on="rkey")` when the `lkey` column in `df3` contains 'c' but the `rkey` column in `df4` does not?
View answer and explanationIn the example `pd.read_csv('examples/ex1.csv')`, how does pandas determine the column headers for the resulting DataFrame `df`?
View answer and explanationWhat is the primary function of the `sep` argument in `pandas.read_csv` and `delimiter` in `to_csv`?
View answer and explanationWhen writing a delimited file manually using Python's `csv.writer`, what does the `writerow` method do?
View answer and explanationWhich two add-on packages are mentioned as being used internally by pandas to read old-style XLS and newer XLSX Excel files, respectively?
View answer and explanationThe pandas function `read_hdf` is described as a shortcut. What is it a shortcut for?
View answer and explanationIn the final example of Chapter 6, the code `pd.read_sql("SELECT * FROM test", db)` is used. What is the type of the `db` object?
View answer and explanation