Library/Computer and Information Sciences/Python for Data Analysis: Data Wrangling with pandas, NumPy & Jupyter/Data Analysis Examples

Data Analysis Examples

50 questions available

Take a quiz Listen to a podcast

Summary unavailable.

Questions

Question 1

In the context of the Bitly data analysis example, what is the primary purpose of using `json.loads` within a list comprehension when reading the data file?

View answer and explanation

Question 2

When initially trying to extract time zones from the Bitly dataset using `[rec["tz"] for rec in records]`, a `KeyError` occurs. Why does this error happen?

View answer and explanation

Question 3

In the Bitly data analysis, how is the issue of missing and empty string time zones handled before creating a final visualization with seaborn?

View answer and explanation

Question 4

How can you decompose the Bitly time zone data into Windows and non-Windows users and then reshape it into a summary table?

View answer and explanation

Question 5

In the MovieLens 1M dataset analysis, what is the primary reason for merging the `ratings`, `users`, and `movies` DataFrames into a single DataFrame named `data`?

View answer and explanation

Question 6

How are movies that received at least 250 ratings identified in the MovieLens data analysis?

View answer and explanation

Question 7

What does the `explode` method accomplish when used on the 'genres' column in the MovieLens dataset?

View answer and explanation

Question 8

In the US Baby Names analysis, what is the purpose of passing `ignore_index=True` to `pd.concat` when assembling the yearly data files?

View answer and explanation

Question 9

How is the 'prop' column, representing the proportion of babies with a given name for a specific year and sex, calculated in the US Baby Names dataset?

View answer and explanation

Question 10

What does the code `prop_cumsum.searchsorted(0.5)` accomplish in the analysis of naming diversity in the US Baby Names dataset?

View answer and explanation

Question 11

In the USDA Food Database example, how is the complete `nutrients` DataFrame constructed from the nested JSON data?

View answer and explanation

Question 12

What is the purpose of renaming the 'description' and 'group' columns in both the `info` and `nutrients` DataFrames in the USDA food analysis?

View answer and explanation

Question 13

How can you find the food with the highest amount of a given nutrient for each nutrient group in the USDA dataset?

View answer and explanation

Question 14

In the 2012 Federal Election Commission (FEC) data analysis, how is a 'party' column added to the DataFrame?

View answer and explanation

Question 15

What is the purpose of bucketing the donation amounts using `pd.cut` in the FEC data analysis?

View answer and explanation

Question 16

After grouping the FEC data by candidate and donation bucket, the code shows `bucket_sums.div(bucket_sums.sum(axis="columns"), axis="index")`. What does this operation calculate?

View answer and explanation

Question 17

In the Bitly data analysis, what is the value of the 'America/New_York' time zone count after running `tz_counts = frame["tz"].value_counts()`?

View answer and explanation

Question 18

In the MovieLens dataset, which movie has the largest negative rating difference, indicating it was preferred much more by female viewers than male viewers?

View answer and explanation

Question 19

According to the US Baby Names analysis, how many of the most popular boy names in 1900 were required to make up 50 percent of the total male births?

View answer and explanation

Question 20

In the FEC data analysis, which occupation represents the highest total donation amount for the candidate 'Romney, Mitt' among the top 7 occupations listed?

View answer and explanation

Question 21

In the MovieLens 1M Dataset, what is the result of using `data.pivot_table("rating", index="title", columns="gender", aggfunc="mean")`?

View answer and explanation

Question 22

What is the primary characteristic of the 'last letter' revolution analysis in the US Baby Names dataset?

View answer and explanation

Question 23

In the Bitly data analysis, to normalize the counts of Windows vs. non-Windows users for each time zone to sum to 1, which pandas method is shown to be more efficient than using `apply`?

View answer and explanation

Question 24

What is the primary data structure of the `db` object after loading the USDA food database with `json.load`?

View answer and explanation

Question 25

In the analysis of the name 'Lesley' and its variants in the US Baby Names dataset, what does the final plot generated by `table.plot(style={"M": "k-", "F": "k--"})` show?

View answer and explanation

Question 26

In the FEC data analysis, how are various spellings and phrasings for occupations like 'INFORMATION REQUESTED' and 'C.E.O.' cleaned up?

View answer and explanation

Question 27

In the MovieLens analysis, which movie is identified as the most divisively rated, based on the standard deviation of its ratings?

View answer and explanation

Question 28

What is the total number of rows in the final `names` DataFrame after concatenating all US Baby Names files from 1880 to 2010?

View answer and explanation

Question 29

In the USDA food database, what is the food group with the highest median Zinc (Zn) value according to the bar plot?

View answer and explanation

Question 30

How many records (rows) are in the MovieLens 1M dataset's `ratings` table before any merging?

View answer and explanation

Question 31

Which pandas operation is used to create the `total_births` pivot table in the US Baby Names analysis, showing total births by year and sex?

View answer and explanation

Question 32

In the Bitly data analysis, what does `agg_counts.sum("columns").argsort()` compute?

View answer and explanation

Question 33

When analyzing the 2012 FEC data, after bucketing donations, how many donations did 'Obama, Barack' receive in the (10, 100] dollar bucket?

View answer and explanation

Question 34

What is the data type (Dtype) of the 'contb_receipt_amt' column in the FEC dataset after being loaded by `pd.read_csv`?

View answer and explanation

Question 35

Which food is identified as having the most 'Alanine' in the USDA food database analysis?

View answer and explanation

Question 36

In the US Baby Names analysis, what is the proportion of male births in 1910 that had names ending in the letter 'd'?

View answer and explanation

Question 37

Which python library and specific class is used to efficiently count time zones in the Bitly data example as an alternative to a manual dictionary loop?

View answer and explanation

Question 38

In the MovieLens dataset analysis, what is the engine specified in the `pd.read_table` function and why might it be necessary?

View answer and explanation

Question 39

In the analysis of the US Baby Names, what is the purpose of the `get_top1000` function?

View answer and explanation

Question 40

What does the code `fec[fec["contb_receipt_amt"] > 0]` achieve in the FEC data analysis?

View answer and explanation

Question 41

In the Bitly data analysis, what is the agent string for the first token split from `frame["a"][1]`?

View answer and explanation

Question 42

What is the total number of non-null values in the 'manufacturer' column of the USDA food database `info` DataFrame?

View answer and explanation

Question 43

In the MovieLens analysis, what is the mean rating for the 'Action' genre by viewers in the '18' age group?

View answer and explanation

Question 44

In the US Baby Names analysis, how many births were there for the name 'Mary' for sex 'F' in the year 1880?

View answer and explanation

Question 45

What is the total donation amount from the state of California ('CA') to 'Obama, Barack' in the FEC dataset analysis?

View answer and explanation

Question 46

What is the primary motivation for creating the `fec_mrbo` subset in the FEC data analysis?

View answer and explanation

Question 47

In the MovieLens analysis, the code `movies["genre"] = movies.pop("genres").str.split("|")` performs two actions. What are they?

View answer and explanation

Question 48

What is the count of the 'Vegetables and Vegetable Products' food group in the USDA database?

View answer and explanation

Question 49

In the US Baby Names analysis, how is the trend of the proportion of boys born with names ending in 'd', 'n', and 'y' plotted over time?

View answer and explanation

Question 50

Which aggregation function is used by default when creating a `pivot_table` in pandas if `aggfunc` is not specified?

View answer and explanation

Other chapters

Preliminaries Python Language Basics, IPython, and Jupyter Notebooks Built-In Data Structures, Functions, and Files NumPy Basics: Arrays and Vectorized Computation Getting Started with pandas Data Loading, Storage, and File Formats Data Cleaning and Preparation Data Wrangling: Join, Combine, and Reshape Plotting and Visualization Data Aggregation and Group Operations Time Series Introduction to Modeling Libraries in Python Advanced NumPy More on the IPython System Index