Books versus Movies - Data Science for User Researchers
Human Centered Design & Engineering, Master's Program
[Original] Research Questions:
- Do more people read a book before or after a movie based on that book has come out?
- Which have better reviews: books or movies based off those books?
Class Objective:
Introduces widely-adopted programming and data science tools in order to use data to answer questions about the characteristics, behaviors, and needs of people who use a wide variety of products.
- Write or modify a program to collect a dataset from Wikipedia or the City of Seattle’s open data portal (Data.Seattle.gov)
- Effectively read web API documentation and write Python software to parse and understand a new and unfamiliar JSON-based web API
- Understand database schemas and use MySQL to extract user data from relational databases
- Use web-based data to effectively answer a substantively interesting question and to present this data effectively in the context of both a formal presentation and a written report
Process:
- Develop intriguing initial research questions based on personal interest and existing API's with relevant data
- [Re]Learn Python to grab and manipulate API data in XML and JSON formats
- Re-evaluate and scope research questions based on available API data and formatting issues
- Grab API data from sources using Python into Excel
- Analyze data in Excel using graphs
- Analyze data in Tableau
Revised Research Questions:
Based on the limitations and restrictions of the data, along with time restraints, I had to adjust my research questions to the following:
- Is there a correlation between the Goodreads rating for a book that has become a movie and the Wikipedia page edits for that movie?
- What genres of books are most commonly adapted to films?
API's Used:
- Wikipedia
- Retrieve list of films based on books
- Gather film name, release data, genre, and edit counts for each film page
- Goodreads
- Retrieve book from list of film titles
- Gather book title, average ratings for a book, number of ratings for a book, and original publishing year
Findings:
- No correlation between the Goodreads rating for a book that has become a movie and the Wikipedia page edits for that movie. See Tableau graph below.
- The genre of book most commonly adapted to film is drama! Followed by comedy and romance. See excel graph below.
Limitations:
- Could not verify if the book title was actually used for the identified film match - had to match the names exactly instead
- Limited recent data in the Wikipedia category "Films based on novels", which was used for the base list
- Unknown relationship and lack of data to compare Goodreads readers and Wikipedia editors
- Limited to none Wikipedia pages for books / novels that matched the list of "Films based on novels"