point-grouping-test

A coding challenge for a Platform Engineer.

Instructions

Your solution should be presented as a standalone script we can run from command line using different input data files, and obtain corresponding output data files.

Please include instructions, comments and anything you deem relevant for us to evaluate your solution.

You can submit it as a Github repository or compressed file.

The main goal: grouping

points.json is a JSON data file that contains a list of objects representing geographic points, with a lat, lon and id. Your job is to write a Python script that will take an integer argument n, then read the file and output the ids of all the points, grouped into n groups, into a second file, groups.json.

Given that the data set represents geographic data, you should try to group points by proximity, so that the geographically closest points are grouped together. You should end up with a clean partitioning of the overall space.

Stretch goal: distribution

When grouping addresses to produce our daily routes, it's important that each van has roughly the same number of stops to make in a day. Do the same as above, with one final optimization - optimize for both geographical proximity, using the function in the first stretch goal, and for equal group size for each van.

About

This repo contains the instructions and template to complete a coding challenge in Python. It is aimed at Python programmers of mid-level experience and up, and is designed to test the programmer's skill in designing and implementing algorithms.

No external libraries are required—however, due to the algorithmic/numerical nature of the code at hand, Python libraries that provide robust implementations of various mathematical or statistical algorithms might be useful. The programmer should feel free to calibrate their approach to their own experience level.

Please do not hesitate to ask us any questions before submitting your solution!

My Solution

I implemented 2 different algorithms for this problem, k-means and DBSCAN. K-means was a better fit for how this problem was supposed to be solved, as it always assigns every point to an nth cluster. With the DBSCAN, due to how disparate some of the points where, it would mark some points as "Noise", and not assign them to any cluster, but this may have been more of an issue with selecting the proper ε for the problem at hand. Either algorithms can be used inside of the group_points.py file, by importing the classes and initializing them with the right parameters. DBSCAN is initialized with all the points as a python dict, and k-means is initialized with the points as a dict, k(or n) as an integer specifying the amount of clusters, and the iterations as an integer specifying how many times you want k-means to run.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
README.md		README.md
dbscan.py		dbscan.py
group_points.py		group_points.py
kmeans.py		kmeans.py
points.json		points.json
results2.json		results2.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

point-grouping-test

Instructions

The main goal: grouping

Stretch goal: distribution

About

My Solution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

point-grouping-test

Instructions

The main goal: grouping

Stretch goal: distribution

About

My Solution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages