Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions content/news/posts/image-dataset-explorer/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ by {{< profile id="lea" >}}

<br>

**A new digital tool helps HASS researchers analyse collections of images.**
**A [new digital tool](https://github.com/Language-Research-Technology/image-dataset-explorer) helps HASS researchers analyse collections of images.**

Humanities researchers can unlock a powerful source of cultural data by working with image collections. Images proliferate increasingly online, and carry meaning and currency in the digital world. They are also sites for language — [think of the thousands of neon signs or street names a collection of pictures might capture](https://www.youtube.com/watch?v=hPjzI_4pNug).

Expand Down Expand Up @@ -49,7 +49,7 @@ The second problem relates to researchers interested in how online algorithms ha

<br>

The notebook produces visualisations by generating different types of 'image embeddings'. An image embedding condenses the features of an image, such as colour, composition and arrangement of objects, into a small numerical representation which can then be compared. An image embedding is what lets us know that two images with crowds of people are more similar than a photo of a crowd and a photo of a forest. These embeddings come out of algorithms that aim to tell us things like ‘This image has trees in it', or ‘This image depicts a crowd of people’. The Image Dataset Explorer uses colour histograms, [VGG algorithms](https://www.sciencedirect.com/topics/computer-science/vgg-19-convolutional-neural-network) and [OpenAI's CLIP model ](https://openai.com/index/clip/)to produce image embeddings that begin to map out the internal structure of the dataset. By choosing different types of embeddings, the notebook allows users to make comparisons between how different embedding methods “see” images.
The notebook produces visualisations by generating different types of 'image embeddings'. An image embedding condenses the features of an image, such as colour, composition and arrangement of objects, into a small numerical representation which can then be compared. An image embedding is what lets us know that two images with crowds of people are more similar than a photo of a crowd and a photo of a forest. These embeddings come out of algorithms that aim to tell us things like ‘This image has trees in it', or ‘This image depicts a crowd of people’. The Image Dataset Explorer uses colour histograms, [VGG algorithms](https://www.sciencedirect.com/topics/computer-science/vgg-19-convolutional-neural-network) and [OpenAI's CLIP model](https://openai.com/index/clip/) to produce image embeddings that begin to map out the internal structure of the dataset. By choosing different types of embeddings, the notebook allows users to make comparisons between how different embedding methods “see” images.

<br>

Expand All @@ -63,6 +63,10 @@ Sam identified a further crossover for language research — the approaches take

<br>

#### [Get started with the Image Dataset Explorer](https://github.com/Language-Research-Technology/image-dataset-explorer)

<br>

<a name="fn-1">1</a> Thanks to Simon Musgrave and Teresa Chan for their helpful comments on this blog post. [↩](#back-1)

<br>