This is the repository containing instructions and base data for a technical assessment. The purpose is to gauge your technical abilities and problem solving and gain an understanding in your problem solving. Understanding how you arrived at a solution is equally important as the solution itself.
We'd like you to answer the following questions, using the method of your choice (Python, SQL or Excel)
If you are comfortable in more than one, please feel free to share one or more solutions in a different language.
- Which artist in this data set lived the longest?
- Who are the top 10 artists by the number of artworks?
- How would you find the artist that created the most artwork by total surface area?
- Did any artists have artwork acquired during their lifetime?
- Please review the quality of the data. What issues did you identify? How might you solve some of these issues to better work with the data?
The data used for this assessment is sourced from the Museum of Modern Art data on Kaggle: https://www.kaggle.com/momanyc/museum-collection
The artworks data set contains 130,262 records, representing all of the works that have been accessioned into MoMA’s collection and cataloged in our database. It includes basic metadata for each work, including title, artist, date, medium, dimensions, and date acquired by the Museum. Some of these records have incomplete information and are noted as “not curator approved.” The artists dataset contains 15,091 records, representing all the artists who have work in MoMA's collection and have been cataloged in our database. It includes basic metadata for each artist, including name, nationality, gender, birth year, and death year.
To make sure that this test can run on any machine, we're using SQLite: https://www.sqlite.org/index.html
We have provided an example Python notebook showing how to connect to, and how to query the database using SQL. Alternatively, the data is also provided in an Excel file.
If you use Python, we would prefer that you responded using a Jupyter notebook. This allows you to annotate your response as required, and makes your response easier to evaluate.
If you haven't used Jupyter before, here's how to get yourself set up: https://jupyter.readthedocs.io/en/latest/install.html#new-to-python-and-jupyter
You may also use Excel or SQL alone. If this is your preference, please ensure your submission is well annotated - including any intermediate steps necessary to answer each question.
Please send your answer in an email to Matthew and include any supporting files or scripts necessary to understand how you arrived at those answers.
Your response should be well annotated in the chosen format. If Excel or SQL are used, be sure to provide both the response and demonstrated workings to each question.