Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions Documents/1 Project Description[Muso Zhe Fan, Daniel M Cooper].md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#Project Description[Muso Zhe Fan, Daniel M Cooper]


##Research Topic
The amount of information we have and the variety of sources generating it is staggering. Consumer products the world over make an incredibly wide range of personal data available. Only a few years ago personal computing was relegated to a handful of peripherals - tablets, cell phones, beepers. Today, personal computing is ubiquitous. Deep market penetration no longer rules the day was new models of distributed computing upend distinctions between central computing - information generated and managed by discrete institutional or corporate entities - and the periphery, which simultaneously divides and produces information endlessly.
Data is increasingly valuable; its cottage industries sprawling; its ability to pique the interest of industries that only a few years ago would not only not have known what to do with data but would not have paid into the billions of dollars that the data trade generates every year.
Still, how to analyze data is, nonetheless, a fraught endeavor. How can data about personal choices throughout the day - to take the stairs or the elevator, for example - be used by inquiring minds? Concomitant with data’s becoming ubiquitous is a need to group data, to read its trends, to palpate the reality that can’t be seen in the numbers through the streams we can see. Are the extra steps a sign of a person voice or a broken escalator? Data is a great way to see relationships.
Data is generated by people and it is generated by object. More importantly, data is generated when people and objects interact. Data is, more than any other field, increasingly the site for hybrid production.
Developing in parallel to changes in data have been massive changes in how we understand cities. The purview of those changes are beyond the scope of this research but what is happily not only within reach but valuable and productive in those changes is the generation of urban fabric through data production and analysis. Cities are an incredible source of data that can shed light on the lives of individuals as much on the corners where informal economies exist.

##Research Scope
Yellow Detective is a project in development, an MVP that exploits ubiquitous if often mundane data production to reveal novel connections integral to the function of cities. Yellow Detective makes good on the promise of a clearer vision of the present and the future made possible by data. It is the health salve that clears the smoke from our detective’s office and the bionic eyepiece to let us see through walls.
The project begins by asking questions about the relationship of a person to a place. By exploratory research techniques, YD creates layers of data to reveal connections, and more importantly, to generate new questions, ways of inquiring, and means of visualization.
We began by asking simple questions about how big data, iterative processing, and critical visual studies could help us to see the forces subtending our cities. Though we began with simple questions about social networks, nebulous data we sought to see better the exciting networks that we engage with daily without having much of a sense of.
How do these vast if informal economies operate in cities and how can we ask questions about them? Calling is only one app on the phones of the grey economy; surely we can access through through some other means. How can we use data to ask questions about informal economies surreptitiously?
We began our research in Dongguan, a city between Shenzhen to its south and Guaghzhao to its north. Known for its light industries, Dongguan is more relaxed than its neighbors as tech hub and capital cities respectively.
The slow life, however, belies a vast and complex network of relationships. While those relationships are never hidden from sight, they operate with tacit consent, under pseudonyms, and without the explicit regulation of the law. They negotiate ways of being available without being seen.
While informal economies rarely state their name and age, they do sometimes appear out in the open. These moments where grey economies slip from the shadows are opportunities for research. So when, on February 14, 2014, police in China made news by raiding a hotel in Dongguan known as a central node for the city’s wide range of prostitution, we put Yellow Detective to work knocking on doors and following the lead.
An article in Hong Kong based South China Morning Post relates the details of the raid: a 5-star hotel run by one of Chinese nouveau riche has been raided by police. Inside they found, to no one’s surprise, many women working as prostitutes. The article goes on to explain that prostitution makes up for some 10% of the local GDP employing 300,000 people. We know, however, that this number must be unfathomably low and SCMP suggests it knows too. Further along in their article they report, “The sex trade relates directly or indirectly to many sectors including hotels, condoms, restaurants, cosmetics, daily goods travel and other things.” SCMP had reported the news and identified a node, a whole invisible network coming up to breathe. Certainly prostitiution isn’t news so why did the paper not pursue the real story? Perhaps they need Yellow Detective.
Prostitution is directly tied to urban issues. Users of China’s Sina Weibo are quoted in the same SCMP article castigating the police “'If you have guts, go do a secret investigation on forced demolitions…’” Its worth noting that not only did users see that prostitution and urban development are tied together but that they chose to identify this relationship via social media.











##Hypothesis

Our hypothesis is that on urban scale, city is like an organism, places are like the cells, but the places are not working individually, they have a comprehensive network to organize and collaborate, on a broader scale would present some features resembles a tissue or organ. Their relationship would be complicated, certain places might be the service area for one place, or they could be the substitute or complement for a place. This relationship is often hidden behind the city. But normally places would be connected by people who keep moving between different places. Therefore, by observing people’s movement, certain relationship network of places would emerge from urban scale.


##Methodology

We will start with the a certain place(originPlace), and find all the people who have checked in there (set A). Then we will find all the places in PRD where the people in set A have checked in (set B). By linking (drawing lines) the points from set A to set B, we could visualize this relationship network, then some places(connectedPlace) will emerge, which would have a relatively high concentration of checkins from people in set A. In that case, we could say that Place 1 has a strong correlation with Place 2.


##Minimum Viable Product

We first want to mainly focus on the latent grey industry in Dongguan, so we choose a hotel as originPlace which is very famous for this service, and we set a time limit from 9pm to 4am, our assumption is that people who checkin during this very late time and at this place have a large possibility that they are prostitute’s client, so the places that these clients linking to might have interesting relationship with this hotel.

But after a test we find that this single place will not give us a good result, so we change originPlace to Houjie, a larger area full of this service instead only a hotel, which gives us very good result. From the final map, we could see that Houjie has a strong relationship with Hongkong, Macau and Guangzhou, there are big correlation with Shenzhen but not as much as the 3 places mentioned before, certain small cities also emerge such as Foshan, Jiangmen and Zhaoqing. On the other hand, we could see that Houjie have a really big service radius across the whole PRD region.

We then extended this methodology to the whole PRD region, so actually you could move to anywhere you want to examine, and then press the update button, it will send the lat/lng data back to server to show the result. Certain area like guangzhou would have huge data, so we limit the set A to 60 in server in case the user wait too long. We want to provide time selection option in our future development, hoping to let user decide what time they want to query, because some urban services which would vary dramatically as time changes.
49 changes: 49 additions & 0 deletions Documents/2 Data Processing[Xi Chen,Lindasy Friedman].md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#Data Processing[Xi Chen,Lindasy Friedman]

The database we used is Weibo Dataset, and we tried to look into the check-in information to geo-locate the places where prostitution may happened.
On the first test, we wanted to create a subset of the Weibo data in order to shrink the range of data to provide a more stable query process. In this round, we targeted the Dongguan prince hotel and queried the check-in information either based on the geo-location of the spot or the checkin_text which contains the words about 'dongguan����ݸ��' and 'prince��̫�ӣ�'.
The filtering code as following:

def filter_database():
query = 'SELECT FROM Checkins WHERE text containstext "��ݸ" and text containstext "̫��"'
records = client.command(query) # My guess: records is the set of all records
cluster_id = 3
rec_list = []
for rec in records:
rec_list.append(rec)
print rec_list
client.db_close()
The result of the code were quite limited and only about 20 check-in results came out, thus in this stage we kept changing different types of query code in orientDB to find the better results we want.
And we also typed some code based on the time range in order to filter out specific type of data. For instance, we want to query the data from 8:00 pm to 4:00am everyday which more related to the yellow industry. Following are the union syntax to check the specific time range.
SELECT * FROM Checkin WHERE lat BETWEEN 22.53 AND 22.56 AND lng BETWEEN 114.04 AND 114.08 AND time REGEXP "^2014-{1}[01]{1}[0-9]{1}-{1}[0-3]{1}[0-9]{1}\s2{1}[0-3]{1}:.*$"

UNION

SELECT * FROM Checkin WHERE lat BETWEEN 22.53 AND 22.56 AND lng BETWEEN 114.04 AND 114.08 AND time REGEXP "^2014-{1}[01]{1}[0-9]{1}-{1}[0-3]{1}[0-9]{1}\s0{1}[0-3]{1}:.*$"
]{1}[0-9]{1}-{1}[0-3]{1}[0-9]{1}\s0{1}[0-3]{1}:.*$"

The language of the code cannot be read by python and it may need further filtering in terms of the uncertainty of the type of check-in.

Then, we slightly changed our target from one spot 'prince hotel' to one district 'houjie', another street that are also famous for these kind of industry. And in these area, we started making the query on one specific day period after the yellow clearance action, this help us get the original places as our set A.

query = 'SELECT FROM Checkin WHERE lat BETWEEN 22.929935 AND 22.961751 AND lng BETWEEN 113.639837 AND 113.693017 AND time BETWEEN "2014-09-03 03:00:00" and "2014-09-04 04:00:00"'
numListings = len(records)
print 'received ' + str(numListings) + ' Checkins'

The next thing we do was using the 60 results we get from the last step to search for the user profile who issued the Weibo check-ins. And then used 'Traverse' command to build up a new query to look for the other places where the group of people goes. In this case, we tried to build up the connection between our original places(set A) and the extended places(set C).

uniqueUsers = []
originPlaces = []
connectedPlaces = []

output = {"type":"FeatureCollection","originP":[],"connectedP":[]}

for record in records:
user = str(record.out)

if user in uniqueUsers:
continue
uniqueUsers.append(user)

places = client.command("SELECT * FROM (TRAVERSE out(Checkin) FROM (SELECT * FROM {})) WHERE @class = 'Place'".format(record.out))
print 'received ' + str(len(places)) + ' connected places from ' + str(record._in)
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#Server Back End[Guangyue Cao, Cameron Cortez, Noele Anna Illien]


##Describe how the server interacts with the client, including: The arguments that are sent from the client and received by the server at the beginning of the request.

Our goal is to query data from a certain area. The area is simply defined by the window size of the front web page. So the first request sent from the client and received by the server is to get coordinates in the format of “(lat1, lat2, lng1, lng2)”. And print out the result:
"received coordinates: [" + lat1 + ", " + lat2 + "], [" + lng1 + ", " + lng2 + "]"

In the meanwhile, sent back message to the menu panel on the client side:

q.put("received coordinates: [" + lat1 + ", " + lat2 + "], [" + lng1 + ", " + lng2 + "]"+" Yellow Detective is querying the data....")

Then the server will query data from the database within the latitude and longitude information and within a certain period of time. and print out the number of Checkins received in this area. And get these user id from these Checkins.

As the next step is to get the connected places, which is the other places these users have checked in.
While running this query the server will send back the real time message to client side:

q.put("received" + str(len(places)) + "connected places from" + str(record._in))

For each original places and connected places the server will store the information in separate lists in Json format.

For original places, we stored the latitude and longitude information for the Checkin points and the content texts posted along with the Checkins;
For connected places, we stored the tile, catalog (type of the place) and the latitude & longitude information.

What’s more, we want to connect the origin points and other related points by lines. In our case, the start point will be the original place and the end point will be the connected place. We build up the line information in the form of (x1,y1,x2,y2)

In the end we return to the client side these three sets of information by using the code:
output["features"].append(originPlaces)
output["features1"].append(connectedPlaces)
output["lines"] = lines

and send back the message to the menu panel:
q.put("Done! Received " + str(numListings) + " Checkins, Yellow Detective is having a rest")


##Describe how the server interacts with the database, including:The query that is sent to the database.

The query sent to the database to is select the Checkins from a given geo area during a certain period of time.
The area is defined by latitude and longitude value which is extracted from the window size of front web page.
The time is defined specifically between 2013-12-03 21:00:00 and 2013-12-04 04:00:00. (in the future, we want our time in this tool can be defined by the number input from the web page, so people can get the results of anytime they want)
query = 'SELECT FROM Checkin WHERE latitude BETWEEN {} AND {} AND longitude BETWEEN {} AND {} AND time BETWEEN "2013-12-03 21:00:00" and "2013-12-04 04:00:00"'

##How the results of the query are processed and formatted for sending back to the client?

For querying the results, we mainly have 5 steps.
Step 1: Get Checkins from given area and time period.
Step 2: For each Checkin, get the user id.
Step 3: Skip repeating users.
Step 4: Find connected places for each Checkin.
Step 5: Store the information of original places/Checkins and connected places in Json format separately for sending back to the client side.
The Json format we established are as below:
originPlaces = {"type":"Feature","properties":{},"geometry":{"type":"Point"}}
connectedPlaces = {"type":"Feature","properties":{},"geometry":{"type":"Point"}}
lines.append({'coordinates': [record.lat, record.lng, place.lat, place.lng]})
this information are sent to the client side using these code:
output["features"].append(originPlaces)
output["features1"].append(connectedPlaces)
output["lines"] = lines
return json.dumps(output)

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Client Front End


#Client Front End [Jiachuan Wu,Siu Tan Wong,Ziyang Zeng]

##Describe the front end User Interface (UI). What options or parameters are available to the user?

According to server side’s data, different layers of weibo check in data are within the geo coordination, latitude between 22.929935 and 22.961751, longitude between 113.639837 and 113.693017, and the time range between 2014-09-03 03:00:00 to 2014-09-04 04:00:00. Through the density of these weibo check in points and connected lines between different sets of checkin points, user can understand the spatial relationship between Houjie and other spaces in Dongguan.
Basically two types of UI have been created, data visualization and interactive tip map. In the part of data visualization, user can see the sequences of check in points in Houjie, check in points in other spaces under same weibo users, and the connection between these spaces per weibo user. Apart from static map, a mouse over function has been created that user can see the details in the check in points they choose.

##Describe the general User Experience (UX) story or narrative you envision for your MVP.

Going through the representations above, we intend to create an UX in which clients can clearly understand the density of connection between different spaces, such as shopping malls, residues, and so on, to our identified sites in Houjie. Other than just showing statistics or infographics, the static and interactive maps we created will present an easily understanding image of spatial connection between one area to other areas. More than mapping, using the check in data, this platform also visual the social connection from one space to other spaces. This is our MVP, which is a prototype, layering and sub setting weibo check in data as human movements and preferences, in order to visualize spatial connections between different city nodes within urban area.

##Reproduce and explain in detail any requests you are sending to the server, including any arguments in the query string, and how they are communicating the decisions the user has made in the UI

To achieve our goals, the communications in terms of data query and data visualization work in high rate. Firstly we wanted to tackle into the specific ‘yellow’ related geo visualization, finding a direct way to point out all the ‘yellow’ points and show their spatial movements. Client side firstly requested for the weibo check in data with time and key words which include ‘yellow’ related sentences, in order to get the specific data check ins. However, the requirements constrained the dataset too much that only few checks in has been found. With our MVP changing, just using weibo check in data in different time under same users, client side requested for check in data set in one time, and another check in data set under the same users in another time.
Describe any further processing that is done to the data received from the server in the front end.

##Describe how the data is visualized using JavaScript/D3. How does the visualization work to communicate the important insights of the data to the user.

We use different circle attributions on the set of original points and another set of end points, appending red on original and black on the end. And we use thin lines to connect original ones and end ones to show the path when people checked in.
Then according to our narrative, we use transition to build up a sequence of representation, setting duration for circles and lines to show the outcome from original points to end points and lines to lines.
Due to the heavy data load, we also add “loading” area to tell the user whether data is loaded and graphics is shown completely. This may base on the user experiences to make sure that people know dataset is loading.
File renamed without changes.
Binary file added Final Images/hongkong 1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final Images/hongkong 2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final Images/hongkong 3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final Images/houjie 1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final Images/houjie 2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final Images/houjie 3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Final Images/video.mov
Binary file not shown.
Binary file added Process Images/boundary debug.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Process Images/event debug.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Process Images/place debug.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Process Images/point projection debug.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Process Images/try to add time.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 0 additions & 5 deletions README.md

This file was deleted.

Loading